PDA

View Full Version : stopping pacemaker doesnt move resources to other node



mnshsnghl
26-Jun-2019, 14:12
I have created a 2 node cluster on sles12. Following is the configuration -

msnode1:~ # crm --version
crm 3.0.0

msnode1:~ # corosync -v
Corosync Cluster Engine, version '2.3.6'
Copyright (c) 2006-2009 Red Hat, Inc.

msnode1:~ # crm config show
node 1: msnode1
node 2: msnode2
primitive mspersonal systemd:mspersonal \
op monitor interval=30s
primitive virtip IPaddr \
params ip=10.243.109.103 cidr_netmask=21 \
op monitor interval=30s
location cli-prefer-virtip virtip role=Started inf: msnode1
colocation msconstraint inf: virtip mspersonal
order msorder Mandatory: virtip mspersonal
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.16-4.8-77ea74d \
cluster-infrastructure=corosync \
cluster-name=mscluster \
stonith-enabled=false \
placement-strategy=balanced \
help \
list \
last-lrm-refresh=1561341732
rsc_defaults rsc-options: \
resource-stickiness=100 \
migration-threshold=2
op_defaults op-options: \
timeout=600 \
record-pending=true

msnode1:~ # crm status
Stack: corosync
Current DC: msnode1 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Tue Jun 25 17:43:44 2019
Last change: Tue Jun 25 17:38:21 2019 by hacluster via cibadmin on msnode1

2 nodes configured
2 resources configured

Online: [ msnode1 msnode2 ]

Full list of resources:

virtip (ocf::heartbeat:IPaddr): Started msnode1
mspersonal (systemd:mspersonal): Started msnode1


When I shut off the cluster on node1 (or reboot node1), the resources started on msnode2, but then they immediately turned off and situation changes as -

msnode1:~ # systemctl stop pacemaker
msnode2:~ # crm status
Stack: corosync
Current DC: msnode2 (version 1.1.16-4.8-77ea74d) - partition WITHOUT quorum
Last updated: Tue Jun 25 17:44:26 2019
Last change: Tue Jun 25 17:38:20 2019 by hacluster via cibadmin on msnode1

2 nodes configured
2 resources configured

Online: [ msnode2 ]
OFFLINE: [ msnode1 ]

Full list of resources:

virtip (ocf::heartbeat:IPaddr): Stopped
mspersonal (systemd:mspersonal): Stopped

When I restart the pacemaker service on msnode1, the resources starts back on msnode1 --

msnode1:~ # systemctl start pacemaker
msnode1:~ # crm status
Stack: corosync
Current DC: msnode2 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Tue Jun 25 17:46:09 2019
Last change: Tue Jun 25 17:38:20 2019 by hacluster via cibadmin on msnode1

2 nodes configured
2 resources configured

Online: [ msnode1 msnode2 ]

Full list of resources:

virtip (ocf::heartbeat:IPaddr): Started msnode1
mspersonal (systemd:mspersonal): Started msnode1

But when I redo the same exercise, the resources actually start on msnode2 -

msnode1:~ # systemctl stop pacemaker
msnode2:~ # crm status
Stack: corosync
Current DC: msnode2 (version 1.1.16-4.8-77ea74d) - partition WITHOUT quorum
Last updated: Tue Jun 25 17:47:00 2019
Last change: Tue Jun 25 17:38:20 2019 by hacluster via cibadmin on msnode1

2 nodes configured
2 resources configured

Online: [ msnode2 ]
OFFLINE: [ msnode1 ]

Full list of resources:

virtip (ocf::heartbeat:IPaddr): Started msnode2
mspersonal (systemd:mspersonal): Started msnode2

But when I start pacemaker again on msnode1, the resources move back to msnode1 which I didnt expect because of stickiness set to 100 which works fine when any one of the resource fails.

I am not able to catch what I am missing in this cluster configuration, which can set everything right for me.

strahil-nikolov-dxc
26-Jun-2019, 15:29
location cli-prefer-virtip virtip role=Started inf: msnode1

this means that you have run the "crm resource migrate virtip msnode1". When node1 dies - it will migrate to node2 and then when node1 is back - it will recover on node1.

Remove it by using "crm resource unmigrate virtip " and it will remove the cli-prefer.

Also, consider adding fencing mechanism - either SBD or any other type in order to guarantee that resources will be failed over.


Another issue:


Current DC: msnode2 (version 1.1.16-4.8-77ea74d) - partition WITHOUT quorum

you need the "two_node:1 " set on both nodes (use csync2 -m /etc/corosync/corosync.conf ; csync2 -x to sync it between nodes)

Note , as the two_node is enabled it will automatically enable "wait_for_all" which means that if both nodes die (power outage , double fencing anything outstanding), then both nodes should be started before any resource is taken.
I would recommend it to leave it by default ,as you don't have a fencing mechanism and you could have the IP and the service started on both nodes - which is bad.

mnshsnghl
26-Jun-2019, 16:19
this means that you have run the "crm resource migrate virtip msnode1". When node1 dies - it will migrate to node2 and then when node1 is back - it will recover on node1.


Remove it by using "crm resource unmigrate virtip " and it will remove the cli-prefer.

Thanks, I think this solved the problem. I am not sure how this got added, I didnt add it and didnt notice also.


Also, consider adding fencing mechanism - either SBD or any other type in order to guarantee that resources will be failed over.[/CODE][/CODE]

I am actually interested in adding a fencing mechanism, and would like to find out what is the other type of fencing that can be added and how ? And also, what is the best fencing mechanism. From my requirement perspective, I dont want a node to be rebooted as part of the fencing, is that possible ?


Another issue:



you need the "two_node:1 " set on both nodes (use csync2 -m /etc/corosync/corosync.conf ; csync2 -x to sync it between nodes)

Note , as the two_node is enabled it will automatically enable "wait_for_all" which means that if both nodes die (power outage , double fencing anything outstanding), then both nodes should be started before any resource is taken.
I would recommend it to leave it by default ,as you don't have a fencing mechanism and you could have the IP and the service started on both nodes - which is bad.

I probably didnt understand this feature. You mean if I have two_node=0, then there are greater chances of split brain ? and if two_node=1, then both nodes should be up before resources are taken ? Well, I dont want both. I actually want the node which comes up first with the cluster should take the resource, and other remain as standby.

However, if I setup fencing, will this attribute required to be set ? Moreover, my cluster can grow and shrink dynamically (increase/decrease nodes), so dont want to rely on any such attribute.

Thanks a lot!

strahil-nikolov-dxc
26-Jun-2019, 17:06
By default quorum is 50% + 1Vote and with 2 nodes to have quorum you need 2 votes.
Yet in a two node cluster - you want to have quorum even with 1 node, so there is a special option "two_node:1" which tells that (we need only 1 vote to survive) to pacemaker.

So ,in your case you need the "two_node:1" and set a proper stonith device (no matter sbd or other). Otherwise - this will never be a cluster,as it will be susceptible to split-brains.
If you add a 3rd node via "ha-cluster-join" that flag (two_node) is automatically removed and should be added when you use "ha-cluster-remove" to reduce from 3 nodes to 2.

singhm16
27-Jun-2019, 10:05
By default quorum is 50% + 1Vote and with 2 nodes to have quorum you need 2 votes.
Yet in a two node cluster - you want to have quorum even with 1 node, so there is a special option "two_node:1" which tells that (we need only 1 vote to survive) to pacemaker.
Understood. Thanks. However, I noticed that when I enabled this option, and when the node where the services are running reboots, the services and virtual IP dont migrate to other node which is running. Is that how it is supposed to work ?


set a proper stonith device (no matter sbd or other).
Can you please suggest a fencing mechanism where there is no compulsion to bring a node down but only bring down required services.

mnshsnghl
27-Jun-2019, 10:38
By default quorum is 50% + 1Vote and with 2 nodes to have quorum you need 2 votes.
Yet in a two node cluster - you want to have quorum even with 1 node, so there is a special option "two_node:1" which tells that (we need only 1 vote to survive) to pacemaker.
Thanks for the explanation. Understood the point.

So ,in your case you need the "two_node:1" and
set a proper stonith device (no matter sbd or other).

Can you please help me implement the fencing mechanism which doesnt shoot down the node but just stops the services that I want.

strahil-nikolov-dxc
28-Jun-2019, 18:09
Understood. Thanks. However, I noticed that when I enabled this option, and when the node where the services are running reboots, the services and virtual IP dont migrate to other node which is running. Is that how it is supposed to work ?


Can you please suggest a fencing mechanism where there is no compulsion to bring a node down but only bring down required services.

Yes, when you reboot a node (or the service is being restarted) the cluster will try to minimize downtime and bring it up on the other node.
Of course that can be controlled via a location constraint order (for example if you have a 3 node cluster and you don't want the cluster to start the resource/group on a specific node).

About the fencing - the idea behind the STONITH (Shoot the other node in the head) is to guarantee that the resource will not be used by the problematic node (for example the node might freeze, but after a minute or two /in recent case 5 min/ it get's recovered just to write something to the Filesystem) , so the cluster can safely start it up on the working node.

If your service is using a shared storage - you can use fence_scsi (pcmk_reboot_action="off") ,which uses persistent reservations (requires the storage to support it - most does) , but there is no guarantee that the frozen node will release the IP if they are not in a single resource group.
So you should have :
1. Filesystem
2. IP
3. APP

Once the node is fenced - it will fail to write to filesystem and the resource will be marked as failed. Everything behind it will be stopped.

Edit:
Of course, setting the cluster in maintenance is mandatory when making system update (with sbd I would recommend you to stop the cluster software locally after setting the maintenance) or maintenance on the application itself.

strahil-nikolov-dxc
28-Jun-2019, 18:16
Can you please help me implement the fencing mechanism which doesnt shoot down the node but just stops the services that I want.

That's tricky. If you use shared storage , you can use fence_scsi with pcmk_reboot_action="off" which will deregister the system from the shared storage.

In order to work properly, you need to put the IP after the file system resource in a single group. Once, the node is fenced - the filesystem will not be available and the node will stop everything after the filesystem resource (for example Shared IP + App).

Edit: Check my previous comment for an example.

strahil-nikolov-dxc
28-Jun-2019, 18:21
For Filesystem monitoring use "OCF_CHECK_LEVEL=20" in the 'op' section .