Hi There,

I have configured HA cluster for MYSQL with DRBD, pacemaker and corosync. I was able to bring up cluster for a few time.
Even when cluster is on and Node1 is master and node2 is slave, if I shutdown node1 , node2 never comes up as master until I restart openais service several times and vice versa.

Now before shutdowm cluster was working I rebooted both nodes and cluster is down and not ready to come up.
I have tried same configurations in ubuntu and its working absolutely fine. I am not sure this is bug in SLES HA solution or I am missing something. We have SLES 11 SP3 and thinking to buy HA extension but still I could not demonstrate that this solution gonna work or not.

Can anyone please help to fix this problem

Thanks

Mahmood

Versions:
SLES11 SP3 full licensed
SLE-HA-11 evaluation
Kernel: 3.0.101-0.35-default
drbd-8.4.4-0.22.9
corosync-1.4.6-0.7.1
pacemaker-1.1.10-0.15.25
lsb-4.0-2.4.21


node sles1
node sles2
primitive p_drbd_mysql ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="30s" role="Slave" \
op monitor interval="15s" role="Master"
primitive p_fs_mysql ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory=" /var/lib/mysql_drbd" fstype="ext3"
primitive p_ip_mysql ocf:heartbeat:IPaddr2 \
params ip="172.16.100.20" cidr_netmask="24" nic="eth1"
primitive p_mysql lsb:mysql
group g_mysql p_fs_mysql p_ip_mysql p_mysql
ms ms_drbd_mysql p_drbd_mysql \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master
order o_drbd_before_mysql inf: ms_drbd_mysql: promote g_mysql:start
property $id="cib-bootstrap-options" \
dc-version="1.1.10-f3eeaf4" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"





Jul 30 11:53:01 sles1 crmd[4494]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Jul 30 11:53:01 sles1 crmd[4494]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jul 30 11:53:01 sles1 crmd[4494]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jul 30 11:53:01 sles1 crmd[4486]: notice: plugin_handle_membership: Membership 128: quorum acquired
Jul 30 11:53:01 sles1 crmd[4486]: notice: crm_update_peer_state: plugin_handle_membership: Node sles1[739271701] - state is now member (wa
Jul 30 11:53:01 sles1 crmd[4486]: notice: crm_update_peer_state: plugin_handle_membership: Node sles2[739271702] - state is now member (wa
Jul 30 11:53:01 sles1 crmd[4494]: notice: plugin_handle_membership: Membership 128: quorum acquired
Jul 30 11:53:01 sles1 crmd[4486]: notice: do_started: The local CRM is operational
Jul 30 11:53:01 sles1 crmd[4486]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTE
Jul 30 11:53:22 sles1 crmd[4486]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jul 30 11:55:22 sles1 crmd[4486]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ]
Jul 30 11:58:22 sles1 crmd[4486]: error: crm_timer_popped: Integration Timer (I_INTEGRATED) just popped in state S_INTEGRATION! (180000ms)
Jul 30 11:58:22 sles1 crmd[4486]: warning: do_state_transition: Progressed to state S_FINALIZE_JOIN after C_TIMER_POPPED
Jul 30 11:58:22 sles1 crmd[4486]: warning: do_state_transition: 1 cluster nodes failed to respond to the join offer.
Jul 30 11:58:22 sles1 crmd[4486]: notice: crmd_join_phase_log: join-1: sles2=welcomed
Jul 30 11:58:22 sles1 crmd[4486]: notice: crmd_join_phase_log: join-1: sles1=integrated
^C
sles1:~ # crm status
Last updated: Wed Jul 30 11:59:59 2014
Last change: Wed Jul 30 11:18:38 2014 by root via cibadmin on sles2
Stack: classic openais (with plugin)
Current DC: NONE
2 Nodes configured, 2 expected votes
5 Resources configured


OFFLINE: [ sles1 sles2 ]



Jul 30 11:53:17 sles2 corosync[9865]: [pcmk ] info: send_member_notification: Sending membership update 128 to 1 children
Jul 30 11:53:17 sles2 corosync[9865]: [pcmk ] info: update_member: Node sles1 now has process list: 00000000000000000000000000101302 (1053442)
Jul 30 11:53:17 sles2 corosync[9865]: [pcmk ] info: send_member_notification: Sending membership update 128 to 1 children
Jul 30 11:53:17 sles2 corosync[9865]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jul 30 11:53:17 sles2 corosync[9865]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul 30 11:53:18 sles2 corosync[9865]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul 30 11:53:38 sles2 corosync[9865]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul 30 11:55:38 sles2 corosync[9865]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul 30 11:55:47 sles2 corosync[9865]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)


sles2:~ # crm status
Last updated: Wed Jul 30 12:00:32 2014
Last change: Wed Jul 30 11:01:12 2014 by root via cibadmin on sles2
Stack: classic openais (with plugin)
Current DC: NONE
2 Nodes configured, 2 expected votes
5 Resources configured


Online: [ sles2 ]
OFFLINE: [ sles1 ]