PDA

View Full Version : 2-Node Cluster: Resources restart when other node reboots



magargee
16-Dec-2013, 16:57
I am new to the HA Extension and I have created 2-node Cluster with SLES 11 SP3 for testing. I have created a group which controls two filesystems in a VG and starts several scripts that touch the filesystems. The group starts up ok and I can force failover/migration with no issue. What I am trying to achieve now is having the group stay running on one of the nodes even if the other one is rebooted. I currently have the resource-stickiness set to "infinity". When I reboot the other node and it comes back online the group/resources stop for a second (the scripts terminate / filesystems un-mount) and restart on the node that they were currently on. How can I configure the group/resources not to restart, just stay running on that node?

I found the post below from back in 2012, but in my case I have my resources set up as class="ocf" and provider="heartbeat".

https://forums.suse.com/showthread.php?1214-On-2-node-Cluster-Resource-restart-when-other-node-reboot&highlight=Resource+restart

Any suggestions and/or guidance would be greatly appreciated.

Thanks!
Chris

jmozdzen
17-Dec-2013, 13:18
Hi Chris,

how's your cluster set to handle quorum situations? In a two-node cluster, shutting down one node will set the cluster to "no quorum", which may mean to stop all resources. You might what to scan the logs to see what the cluster manager tells you about the reason to shut down the resource(s) (and yes I know that in the default install, the log is pretty verbose...)

Regards,
Jens

magargee
17-Dec-2013, 15:15
Jens -

The "No Quorum Policy" is set to "ignore". Actually the resources stay up and running on the one node when the other one is shutdown. I see the disruption when the "down" node comes back online and starts up the cluster processes.

So I took your suggestion and dove into the logs (/var/log/messages) and found out it was encountering a "resource" that was activate on both nodes:

t01lap00158 pengine[4376]: error: native_create_actions: Resource appsvg00 (ocf::LVM) is active on 2 nodes attempting recovery
t01lap00158 pengine[4376]: warning: native_create_actions: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.

The VG "appsvg00" (shared between the nodes) was being activated during the boot thus causing the resource to be active on both nodes so I modified "/etc/sysconfig/lvm" :

LVM_VGS_ACTIVATED_ON_BOOT="vg00"

Since I only need vg00 activate at boot time (cluster takes care of the rest) and now the Group/Resource no longer stop/starts when the other node comes back online.

Thanks!
Chris

jmozdzen
17-Dec-2013, 15:36
Hi Chris,

great you got it solved and thank you for reporting back the details!

Regards,
Jens