HA Configuration Issue

Hi,

I am facing issue with HA cluster. I have 2 nodes. I am able to see one node online in each server respectively.

NODE 1

SRVPHN85:~ # crm status
Stack: corosync
Current DC: SRVPHN85 (version 1.1.15-19.15-e174ec8) - partition WITHOUT quorum
Last updated: Tue Jul 25 15:39:36 2017
Last change: Tue Jul 25 14:04:02 2017 by root via cibadmin on SRVPHN85

1 node configured
1 resource configured

Online: [ SRVPHN85 ]

Full list of resources:

admin_addr (ocf::heartbeat:IPaddr2): Stopped


NODE 2

SRVPHN87:~ # crm status
Stack: corosync
Current DC: SRVPHN87 (version 1.1.15-21.1-e174ec8) - partition WITHOUT quorum
Last updated: Tue Jul 25 15:35:00 2017
Last change: Tue Jul 25 15:05:57 2017 by root via cibadmin on SRVPHN87

1 node configured
0 resources configured

Online: [ SRVPHN87 ]

Full list of resources:


Need to resolved the issue. Need help.

Comments

  • arunabha_banerjeearunabha_banerjee New or Quiet Member
    Hi Raju,

    Could you please share "crm configure show" output? I am suspecting there is some problem with network communication (multicast) between the nodes. Please change it to unicast and try to join the second node again.

    Thanks
  • raju7258raju7258 New or Quiet Member
    Hi,

    Thanks for the reply. I think ports 5404 and 5405 are blocked between nodes. Will there be any issue due to this.

    SRVPHN85:~ # crm configure show
    node 184357468: SRVPHN85
    property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=1.1.15-21.1-e174ec8 \
    cluster-infrastructure=corosync \
    cluster-name=hacluster \
    show \
    stonith-enabled=false


    SRVPHN87:~ # crm configure show
    node 184357468: SRVPHN87
    property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=1.1.15-21.1-e174ec8 \
    cluster-infrastructure=corosync \
    cluster-name=hacluster \
    show \
    stonith-enabled=false

  • raju7258raju7258 New or Quiet Member
    Sorry mistake.

    SRVPHN85:~ # crm configure show
    node 184357467: SRVPHN85 \
    attributes standby=off
    primitive admin_addr IPaddr2 \
    params ip=xx.xx.xx.xx \
    op monitor interval=10 timeout=20 \
    meta target-role=Started
    property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=1.1.15-19.15-e174ec8 \
    cluster-infrastructure=corosync \
    cluster-name=hacluster \
    stonith-enabled=false \
    placement-strategy=balanced
    rsc_defaults rsc-options: \
    resource-stickiness=1 \
    migration-threshold=3
    op_defaults op-options: \
    timeout=600 \
    record-pending=true




    SRVPHN87:~ # crm configure show
    node 184357468: SRVPHN87
    property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=1.1.15-21.1-e174ec8 \
    cluster-infrastructure=corosync \
    cluster-name=hacluster \
    show \
    stonith-enabled=false
  • arunabha_banerjeearunabha_banerjee New or Quiet Member
    Please share "/etc/corosync/corosync.conf" file.
  • raju7258raju7258 New or Quiet Member
    NODE 1

    # Please read the corosync.conf.5 manual page

    totem {
    version: 2
    secauth: on
    crypto_hash: sha1
    crypto_cipher: aes256
    cluster_name: hacluster
    clear_node_high_bit: yes

    token: 5000
    token_retransmits_before_loss_const: 10
    join: 60
    consensus: 6000
    max_messages: 20

    interface {
    ringnumber: 0
    bindnetaddr: xx.xx.xx.xx
    mcastaddr: 239.108.147.175
    mcastport: 5405
    ttl: 1
    }
    }
    logging {
    fileline: off
    to_stderr: no
    to_logfile: no
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
    debug: off
    timestamp: on
    logger_subsys {
    subsys: QUORUM
    debug: off
    }
    }
    quorum {
    # Enable and configure quorum subsystem (default: off)
    # see also corosync.conf.5 and votequorum.5
    provider: corosync_votequorum
    expected_votes: 3
    two_node: 0
    }


    NODE 2

    # Please read the corosync.conf.5 manual page

    totem {
    version: 2
    secauth: on
    crypto_hash: sha1
    crypto_cipher: aes256
    cluster_name: hacluster
    clear_node_high_bit: yes

    token: 5000
    token_retransmits_before_loss_const: 10
    join: 60
    consensus: 6000
    max_messages: 20

    interface {
    ringnumber: 0
    bindnetaddr: xx.xx.xx.xx
    mcastaddr: 239.108.147.175
    mcastport: 5405
    ttl: 1
    }
    }
    logging {
    fileline: off
    to_stderr: no
    to_logfile: no
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
    debug: off
    timestamp: on
    logger_subsys {
    subsys: QUORUM
    debug: off
    }
    }
    quorum {
    # Enable and configure quorum subsystem (default: off)
    # see also corosync.conf.5 and votequorum.5
    provider: corosync_votequorum
    expected_votes: 3
    two_node: 0
    }
  • arunabha_banerjeearunabha_banerjee New or Quiet Member
    You have to change few things.

    1. Change network communication ===> udpu
    totem {
            version: 2
            secauth: on
            crypto_hash: sha1
            crypto_cipher: aes256
            cluster_name: hacluster
            clear_node_high_bit: yes
            token: 5000
            token_retransmits_before_loss_const: 10
            join: 60
            consensus: 6000
            max_messages: 20
            interface {
                    ringnumber: 0
                    bindnetaddr: 192.168.220.0
                    mcastport: 5405
                    ttl: 1
            }
    
            transport: udpu
    }
    

    2. Change quorum section (For two nodes)
    quorum {
    
            # Enable and configure quorum subsystem (default: off)
            # see also corosync.conf.5 and votequorum.5
            provider: corosync_votequorum
            expected_votes: 2
            two_node: 1
    }
    
    

    3. Set proper cib-bootstrap option (For two nodes)
    property cib-bootstrap-options: \
            stonith-enabled=true \
            placement-strategy=balanced \
            no-quorum-policy=ignore \
            stonith-action=reboot \
            startup-fencing=false \
            stonith-timeout=150 \
    
  • raju7258raju7258 New or Quiet Member
    You have to change few things.

    1. Change network communication ===> udpu
    totem {
            version: 2
            secauth: on
            crypto_hash: sha1
            crypto_cipher: aes256
            cluster_name: hacluster
            clear_node_high_bit: yes
            token: 5000
            token_retransmits_before_loss_const: 10
            join: 60
            consensus: 6000
            max_messages: 20
            interface {
                    ringnumber: 0
                    bindnetaddr: 192.168.220.0
                    mcastport: 5405
                    ttl: 1
            }
    
            transport: udpu
    }
    

    2. Change quorum section (For two nodes)
    quorum {
    
            # Enable and configure quorum subsystem (default: off)
            # see also corosync.conf.5 and votequorum.5
            provider: corosync_votequorum
            expected_votes: 2
            two_node: 1
    }
    
    

    3. Set proper cib-bootstrap option (For two nodes)
    property cib-bootstrap-options: \
            stonith-enabled=true \
            placement-strategy=balanced \
            no-quorum-policy=ignore \
            stonith-action=reboot \
            startup-fencing=false \
            stonith-timeout=150 \
    







    how can i remove the cluster from both nodes and start installing from first?
  • rajeshsaminenirajeshsamineni New or Quiet Member
    Seems both nodes are behaving like individual nodes, please use sleha-join -c <primary node name> to resolve the issue.



    raju7258 wrote: »
    Hi,

    I am facing issue with HA cluster. I have 2 nodes. I am able to see one node online in each server respectively.

    NODE 1

    SRVPHN85:~ # crm status
    Stack: corosync
    Current DC: SRVPHN85 (version 1.1.15-19.15-e174ec8) - partition WITHOUT quorum
    Last updated: Tue Jul 25 15:39:36 2017
    Last change: Tue Jul 25 14:04:02 2017 by root via cibadmin on SRVPHN85

    1 node configured
    1 resource configured

    Online: [ SRVPHN85 ]

    Full list of resources:

    admin_addr (ocf::heartbeat:IPaddr2): Stopped


    NODE 2

    SRVPHN87:~ # crm status
    Stack: corosync
    Current DC: SRVPHN87 (version 1.1.15-21.1-e174ec8) - partition WITHOUT quorum
    Last updated: Tue Jul 25 15:35:00 2017
    Last change: Tue Jul 25 15:05:57 2017 by root via cibadmin on SRVPHN87

    1 node configured
    0 resources configured

    Online: [ SRVPHN87 ]

    Full list of resources:


    Need to resolved the issue. Need help.
  • nnikaljennikalje New or Quiet Member
    raju7258 wrote: »
    Hi,

    I am facing issue with HA cluster. I have 2 nodes. I am able to see one node online in each server respectively.

    NODE 1

    SRVPHN85:~ # crm status
    Stack: corosync
    Current DC: SRVPHN85 (version 1.1.15-19.15-e174ec8) - partition WITHOUT quorum
    Last updated: Tue Jul 25 15:39:36 2017
    Last change: Tue Jul 25 14:04:02 2017 by root via cibadmin on SRVPHN85

    1 node configured
    1 resource configured

    Online: [ SRVPHN85 ]

    Full list of resources:

    admin_addr (ocf::heartbeat:IPaddr2): Stopped


    NODE 2

    SRVPHN87:~ # crm status
    Stack: corosync
    Current DC: SRVPHN87 (version 1.1.15-21.1-e174ec8) - partition WITHOUT quorum
    Last updated: Tue Jul 25 15:35:00 2017
    Last change: Tue Jul 25 15:05:57 2017 by root via cibadmin on SRVPHN87

    1 node configured
    0 resources configured

    Online: [ SRVPHN87 ]

    Full list of resources:


    Need to resolved the issue. Need help.




    "Please check the servers are syncing time with NTP properly"


    -Nitiratna Nikalje
  • strahilstrahil New or Quiet Member
    Have you checked the firewall ports are opened?
    I have seen such behaviour when the nodes cannot communicate with each other.


    As you haven't mentioned which verison of SLES you are using - I assume SLES 15.
    It's using firewalld by default and that doesn't have a firewall service by default.

    On my test openSUSE 15.1 I am using the following:
    # cat /etc/firewalld/services/high-availability.xml
    <?xml version="1.0" encoding="utf-8"?>
    <service>
      <short>Custom High Availability Service</short>
      <description>This allows you to use the High Availability . Ports are opened for corosync, pacemaker_remote, dlm , hawk and corosync-qnetd.</description>
      <port protocol="tcp" port="7630"/>
      <port protocol="tcp" port="3121"/>
      <port protocol="tcp" port="5403"/>
      <port protocol="udp" port="5404"/>
      <port protocol="udp" port="5405"/>
      <port protocol="tcp" port="9929"/>
      <port protocol="udp" port="9929"/>
      <port protocol="tcp" port="21064"/>
    </service>
    
Sign In or Register to comment.