PDA

View Full Version : Unable to configure fresh pacemake cluster on SLES 11.1



britair
26-Apr-2012, 14:17
Hi,

I just installed a 2-nodes cluster with SLES-11.1-for-VMware and HA-extension CDs.
My configuration is plain vanilla. I just specified 'bindnetaddr' and 'mcastaddr' in the conffile and started Corosync (openais).

Corosync seems to work :

corosync [MAIN ] Corosync Cluster Engine ('1.2.1'): started and ready to provide service.
corosync [MAIN ] Corosync built-in features: nss
corosync [MAIN ] Successfully configured openais services to load
corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
corosync [TOTEM ] Initializing transport (UDP/IP).
corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
corosync [TOTEM ] The network interface [172.29.29.107] is now up.
corosync [SERV ] Service engine loaded: openais cluster membership service B.01.01
corosync [SERV ] Service engine loaded: openais event service B.01.01
corosync [SERV ] Service engine loaded: openais checkpoint service B.01.01
corosync [SERV ] Service engine loaded: openais availability management framework B.01.01
corosync [SERV ] Service engine loaded: openais message service B.03.01
corosync [SERV ] Service engine loaded: openais distributed locking service B.03.01
corosync [SERV ] Service engine loaded: openais timer service A.01.01
corosync [SERV ] Service engine loaded: corosync extended virtual synchrony service
corosync [SERV ] Service engine loaded: corosync configuration service
corosync [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
corosync [SERV ] Service engine loaded: corosync cluster config database access v1.01
corosync [SERV ] Service engine loaded: corosync profile loading service
corosync [SERV ] Service engine loaded: corosync cluster quorum service v0.1
corosync [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
corosync [CLM ] CLM CONFIGURATION CHANGE
corosync [CLM ] New Configuration:
corosync [CLM ] Members Left:
corosync [CLM ] Members Joined:
corosync [CLM ] CLM CONFIGURATION CHANGE
corosync [CLM ] New Configuration:
corosync [CLM ] r(0) ip(172.29.29.107)
corosync [CLM ] Members Left:
corosync [CLM ] Members Joined:
corosync [CLM ] r(0) ip(172.29.29.107)
corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
corosync [CLM ] CLM CONFIGURATION CHANGE
corosync [CLM ] New Configuration:
corosync [CLM ] r(0) ip(172.29.29.107)
corosync [CLM ] Members Left:
corosync [CLM ] Members Joined:
corosync [CLM ] CLM CONFIGURATION CHANGE
corosync [CLM ] New Configuration:
corosync [CLM ] r(0) ip(172.29.29.106)
corosync [CLM ] r(0) ip(172.29.29.107)
corosync [CLM ] Members Left:
corosync [CLM ] Members Joined:
corosync [CLM ] r(0) ip(172.29.29.106)
corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
corosync [MAIN ] Completed service synchronization, ready to provide service.

But I can't access its configuration:


# crm configure
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
ERROR: cannot parse xml: no element found: line 1, column 0
crm(live)configure# show
ERROR: No CIB!
crm(live)configure#

I've never had this problem with other distributions and Google points me to a SLES-specific problem (http://www.gossamer-threads.com/lists/linuxha/users/69796).
I can't find a way to force-create a CIB on Corosync...
Is there any additionnal step needed to get Pacemaker/Corosync to work on SLES ?

jmozdzen
26-Apr-2012, 15:28
Hi britair,

first let me say that I have not tried this with the VMware edition, just with plain SLES11SP1 (+HAE, of course).


Hi,

I just installed a 2-nodes cluster with SLES-11.1-for-VMware and HA-extension CDs.
My configuration is plain vanilla. I just specified 'bindnetaddr' and 'mcastaddr' in the conffile and started Corosync (openais).

Corosync seems to work :
[...]
But I can't access its configuration:


# crm configure
Signon to CIB failed: connection failed
[...]

I've never had this problem with other distributions and Google points me to a SLES-specific problem (http://www.gossamer-threads.com/lists/linuxha/users/69796).
I can't find a way to force-create a CIB on Corosync...
Is there any additionnal step needed to get Pacemaker/Corosync to work on SLES ?

I noticed that the initial complaint is that crm cannot connect to the CIB (daemon) - can you confirm it's running (via ps)? Do you see any "cib" messages in syslog? AFAICS this hasn't been verified in the referenced thread either, but would make a good starting point.

Regards,
Jens

PS: I didn't get why this ought to be a Novell-specific problem? In that thread, multiple (non-distro-specific) causes where mentioned, many options were left open. And unfortunately, the thread opener seems not to have reported back what the original cause was :-/

britair
26-Apr-2012, 16:10
Found it! I spotted the error by comparing my configuration to another cluster I created on CentOS last week :

The provided '/etc/corosync/corosync.conf.example' file is missing two sections and I didn't noticed it at first. All you have to do is add this at the end of the config file and it works fine :


aisexec {
user: root
group: root
}

service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 0
}

Regards,

jmozdzen
26-Apr-2012, 16:19
britair,

thanks for reporting back the solution and good to know it's up & running.

With regards,
Jens