Results 1 to 3 of 3

Thread: HA in sles15 lvmlockd

  1. #1

    HA in sles15 lvmlockd

    Hi

    I have configure 2 node cluster on sles 15 with lvm in exclusive mode. I have problem when I fence or reboot active node the resources doesn't move to secondary node. It stop at activation of lvm


    2019-01-25T14:20:33.996488+01:00 sles15cl2 pengine[1828]: notice: Watchdog will be used via SBD if fencing is required
    2019-01-25T14:20:33.997068+01:00 sles15cl2 pengine[1828]: warning: Processing failed op start for vgcluster on sles15cl2: unknown error (1)
    2019-01-25T14:20:33.997296+01:00 sles15cl2 pengine[1828]: warning: Processing failed op start for vgcluster on sles15cl2: unknown error (1)
    2019-01-25T14:20:33.998199+01:00 sles15cl2 pengine[1828]: warning: Forcing vgcluster away from sles15cl2 after 1000000 failures (max=3)


    Failed Actions:
    * lvmlockd_stop_0 on sles15cl1 'not installed' (5): call=56, status=Not installed, exitreason='',
    last-rc-change='Fri Jan 25 14:41:07 2019', queued=1ms, exec=1ms


    Failed Actions:
    * vgcluster_start_0 on sles15cl2 'unknown error' (1): call=82, status=Timed Out, exitreason='',
    last-rc-change='Fri Jan 25 13:49:36 2019', queued=0ms, exec=90003ms
    * vgcluster_start_0 on sles15cl1 'not configured' (6): call=39, status=complete, exitreason='lvmlockd daemon is not running!',
    last-rc-change='Fri Jan 25 13:51:06 2019', queued=0ms, exec=308ms


    sles15cl2:~ # crm status
    Stack: corosync
    Current DC: sles15cl2 (version 1.1.18+20180430.b12c320f5-1.14-b12c320f5) - partition with quorum
    Last updated: Fri Jan 25 14:00:59 2019
    Last change: Fri Jan 25 14:00:55 2019 by root via cibadmin on sles15cl2

    2 nodes configured
    10 resources configured

    Online: [ sles15cl1 sles15cl2 ]

    Full list of resources:

    admin-ip (ocf::heartbeat:IPaddr2): Started sles15cl2
    stonith-sbd (stonith:external/sbd): Started sles15cl2
    Clone Set: cl-storage [g-storage]
    Started: [ sles15cl1 sles15cl2 ]
    Resource Group: apache-group
    ip-apache (ocf::heartbeat:IPaddr2): Started sles15cl1
    vgcluster (ocf::heartbeat:LVM-activate): Stopped
    clusterfs (ocf::heartbeat:Filesystem): Stopped
    service-apache (ocf::heartbeat:apache): Stopped

    Failed Actions:
    * vgcluster_start_0 on sles15cl2 'unknown error' (1): call=82, status=Timed Out, exitreason='',
    last-rc-change='Fri Jan 25 13:49:36 2019', queued=0ms, exec=90003ms
    * vgcluster_start_0 on sles15cl1 'not configured' (6): call=39, status=complete, exitreason='lvmlockd daemon is not running!',
    last-rc-change='Fri Jan 25 13:51:06 2019', queued=0ms, exec=308ms




    it's look like lvmlockd is not running but it is running

    sles15cl2:/usr/lib/ocf/resource.d/heartbeat # ps -ef |grep dlm
    root 2714 1 0 14:43 ? 00:00:00 dlm_controld -s 0
    root 2792 1 0 14:43 ? 00:00:00 lvmlockd -p /run/lvmlockd.pid -A 1 -g dlm
    root 4040 2 0 14:45 ? 00:00:00 [dlm_scand]
    root 4041 2 0 14:45 ? 00:00:00 [dlm_recv]
    root 4042 2 0 14:45 ? 00:00:00 [dlm_send]
    root 4043 2 0 14:45 ? 00:00:00 [dlm_recoverd]
    root 4050 2 0 14:45 ? 00:00:00 [dlm_recoverd]
    root 23871 2919 0 15:16 pts/0 00:00:00 grep --color=auto dlm

    sles15cl2:/usr/lib/ocf/resource.d/heartbeat # ps -ef |grep lvm
    root 381 1 0 14:42 ? 00:00:00 /usr/sbin/lvmetad -f
    root 2792 1 0 14:43 ? 00:00:00 lvmlockd -p /run/lvmlockd.pid -A 1 -g dlm
    root 23957 2919 0 15:16 pts/0 00:00:00 grep --color=auto lvm
    sles15cl2:/usr/lib/ocf/resource.d/heartbeat #



    It's look like bug described here:
    https://github.com/ClusterLabs/resou...b3e34b04af07ca

    Resources can be only started if I run: crm resource cleanup

    Is there some other workaround? If not then this is not a cluster...

    Thanks

    Jost

  2. #2

    Re: HA in sles15 lvmlockd

    rakovec,

    It appears that in the past few days you have not received a response to your
    posting. That concerns us, and has triggered this automated reply.

    These forums are peer-to-peer, best effort, volunteer run and that if your issue
    is urgent or not getting a response, you might try one of the following options:

    - Visit http://www.suse.com/support and search the knowledgebase and/or check all
    the other support options available.
    - Open a service request: https://www.suse.com/support
    - You could also try posting your message again. Make sure it is posted in the
    correct newsgroup. (http://forums.suse.com)

    Be sure to read the forum FAQ about what to expect in the way of responses:
    http://forums.suse.com/faq.php

    If this is a reply to a duplicate posting or otherwise posted in error, please
    ignore and accept our apologies and rest assured we will issue a stern reprimand
    to our posting bot..

    Good luck!

    Your SUSE Forums Team
    http://forums.suse.com



  3. Re: HA in sles15 lvmlockd

    In order to find the reason you need to debug it.
    Power off one of the nodes and then wait for the failure , but don't clear it up.
    Then Run the resource in debug mode - ou can check how to do it here: https://wiki.clusterlabs.org/wiki/De...ource_Failures

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •