Attempting to setup corosync on san1.example.com [192.168.1.1] fails:
Code:
san1:~ # sleha-init 
WARNING: Could not detect IP address for eth0
WARNING: Could not detect network address for eth0
  Enabling sshd service
  /root/.ssh/id_rsa already exists - overwrite? [y/N] y
  Generating ssh key
  Configuring csync2
  Generating csync2 shared key (this may take a while)...ERROR: Can't create csync2 key
So, instead, I setup corosync on san2.example.com [192.168.1.2] instead:
Code:
san2:~ # sleha-init 
WARNING: Could not detect IP address for eth0
WARNING: Could not detect network address for eth0
  Enabling sshd service
  Generating ssh key
  Configuring csync2
  Generating csync2 shared key (this may take a while)...done
  Enabling csync2 service
  Enabling xinetd service
  csync2 checking files
  
Configure Corosync:
  This will configure the cluster messaging layer.  You will need
  to specify a network address over which to communicate (default
  is eth0's network, but you can use the network address of any
  active interface), a multicast address and multicast port.

  Network address to bind to (e.g.: 192.168.1.0) [] 192.168.1.2
  Multicast address (e.g.: 239.x.x.x) [239.129.63.45] 239.1.1.2
  Multicast port [5405] 
  
Configure SBD:
  If you have shared storage, for example a SAN or iSCSI target,
  you can use it avoid split-brain scenarios by configuring SBD.
  This requires a 1 MB partition, accessible to all nodes in the
  cluster.  The device path must be persistent and consistent
  across all nodes in the cluster, so /dev/disk/by-id/* devices
  are a good choice.  Note that all data on the partition you
  specify here will be destroyed.

  Do you wish to use SBD? [y/N] 
WARNING: Not configuring SBD - STONITH will be disabled.
  Enabling hawk service
    HA Web Konsole is now running, to see cluster status go to:
      https://SERVER:7630/
    Log in with username 'hacluster', password 'linux'
WARNING: You should change the hacluster password to something more secure!
  Enabling openais service
  Waiting for cluster........done
  Loading initial configuration
  Done (log saved to /var/log/sleha-bootstrap.log)
But then san1.example.com [192.168.1.1] complains when joining the cluster:
Code:
san1:~ # sleha-join
WARNING: Could not detect IP address for eth0
WARNING: Could not detect network address for eth0
  
Join This Node to Cluster:
  You will be asked for the IP address of an existing node, from which
  configuration will be copied.  If you have not already configured
  passwordless ssh between nodes, you will be prompted for the root
  password of the existing node.

  IP address or hostname of existing node (e.g.: 192.168.1.1) [] 192.168.1.2
  Enabling sshd service
  /root/.ssh/id_rsa already exists - overwrite? [y/N] y
  Retrieving SSH keys from 192.168.1.2
Password: 
  Configuring csync2
  Enabling csync2 service
  Enabling xinetd service
WARNING: csync2 of /etc/csync2/csync2.cfg failed - file may not be in sync on all nodes
WARNING: csync2 run failed - some files may not be sync'd
  Merging known_hosts
WARNING: known_hosts collection may be incomplete
WARNING: known_hosts merge may be incomplete
  Probing for new partitions......ERROR: Failed to probe new partitions
I've verified that the firewall rules of both hosts have been flushed:
Code:
san1:~ # iptables -L -v
Chain INPUT (policy ACCEPT 24M packets, 66G bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 25M packets, 37G bytes)
 pkts bytes target     prot opt in     out     source               destination
Code:
san2:~ # iptables -L -v
Chain INPUT (policy ACCEPT 22M packets, 39G bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 17M packets, 43G bytes)
 pkts bytes target     prot opt in     out     source               destination
There are some entries in /var/log/messages of san2.example.com that seem related to this:
Code:
Dec 17 19:04:51 san2 cib: [3615]: info: cib_stats: Processed 83 operations (1445.00us average, 0% utilization) in the last 10min
Dec 17 19:10:18 san2 crmd: [3620]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
Dec 17 19:10:18 san2 crmd: [3620]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
Dec 17 19:10:18 san2 crmd: [3620]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
Dec 17 19:10:18 san2 pengine: [3619]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 17 19:10:18 san2 crmd: [3620]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_res
ponse ]
Dec 17 19:10:18 san2 crmd: [3620]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1355800218-15) derived from /var/lib/pengine/pe-input-2.bz2
Dec 17 19:10:18 san2 crmd: [3620]: notice: run_graph: ==== Transition 2 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-2.bz2): Complet
e
Dec 17 19:10:18 san2 crmd: [3620]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 17 19:10:18 san2 pengine: [3619]: notice: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/pengine/pe-input-2.bz2
Dec 17 19:14:51 san2 cib: [3615]: info: cib_stats: Processed 1 operations (0.00us average, 0% utilization) in the last 10min
What's going on here? Why can't san1.example.com setup/join the cluster?

Eric Pretorious
Truckee, CA