Hi,

i have two nodes which access concurrently a OCFS2 Volume on a SAN. One Node (let's call him 20), mounts the volume automatically.
The other does not (lets call him 10).
This is the actual error message (in /var/log/messages from 10):

Code:
Oct 18 14:20:32 sunhb65277 kernel: [   76.198486] (kworker/u:0,5,0):o2net_connect_expired:1724 ERROR: no connection established with node 2 after 30.0 seconds, giving up and returning errors.
Oct 18 14:20:32 sunhb65277 kernel: [   76.198506] (mount.ocfs2,6559,1):dlm_request_join:1472 ERROR: Error -107 when sending message 510 (key 0x666c6172) to node 2
Oct 18 14:20:32 sunhb65277 kernel: [   76.198513] (mount.ocfs2,6559,1):dlm_try_to_join_domain:1648 ERROR: status = -107
Oct 18 14:20:32 sunhb65277 kernel: [   76.198518] (mount.ocfs2,6559,1):dlm_join_domain:1948 ERROR: status = -107
Oct 18 14:20:32 sunhb65277 kernel: [   76.198628] (mount.ocfs2,6559,1):dlm_register_domain:2214 ERROR: status = -107
Oct 18 14:20:32 sunhb65277 kernel: [   76.198649] (mount.ocfs2,6559,1):o2cb_cluster_connect:313 ERROR: status = -107
Oct 18 14:20:32 sunhb65277 kernel: [   76.198653] (mount.ocfs2,6559,1):ocfs2_dlm_init:2995 ERROR: status = -107
Oct 18 14:20:32 sunhb65277 kernel: [   76.198668] (mount.ocfs2,6559,1):ocfs2_mount_volume:1881 ERROR: status = -107
Oct 18 14:20:32 sunhb65277 kernel: [   76.198713] ocfs2: Unmounting device (252,5) on (node 0)
Oct 18 14:20:32 sunhb65277 kernel: [   76.198720] (mount.ocfs2,6559,1):ocfs2_fill_super:1236 ERROR: status = -107
Oct 18 14:20:35 sunhb65277 sm-notify[6644]: Version 1.2.3 starting
/var/log/boot.msg:
Code:
<notice -- Oct 18 14:19:57.711549000> multipathd start
Starting multipathddone
<notice -- Oct 18 14:19:58.775058000> 'multipathd start' exits with status 0
<notice -- Oct 18 14:19:58.776666000> ocfs2 start
Starting Oracle Cluster File System (OCFS2) mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/3600c0ff00012824b04af7a5201000000 on /images. Check 'dmesg' for more information on this error.

failed
<notice -- Oct 18 14:20:34.965416000> 'ocfs2 start' exits with status 1
I don't know if it's a connection problem with the two nodes, or if node 10 has problems.
Connection between the hosts seem ok. I have on each a bond with a private address just for the communication between the two.
They are attached directly, without a switch or s.th. like that. One can ping each other.

This is my/etc/ocfs2/cluster.conf:

Code:
cluster:
        node_count = 2
        name = idg

node:
        ip_port = 7777
        ip_address = 192.168.100.10
        number = 1
        name = sunhb65277
        cluster = idg

node:
        ip_port = 7777
        ip_address = 192.168.100.20
        number = 2
        name = sunhb58820
        cluster = idg
File is identical on both (checked with md5sum). Port is open on both.


I found stuff like that in the logs from 10:

Oct 18 14:19:58 sunhb65277 kernel: [ 41.212716] alua: release port group 0
Oct 18 14:19:58 sunhb65277 kernel: [ 41.212725] sd 8:0:0:0: alua: Detached
Oct 18 14:19:58 sunhb65277 kernel: [ 41.244688] alua: release port group 1
Oct 18 14:19:58 sunhb65277 kernel: [ 41.244696] sd 7:0:0:0: alua: Detached
Oct 18 14:19:58 sunhb65277 multipathd: 3600c0ff00012824b04af7a5201000000: load table [0 2111328000 multipath 1 queue_if_no_path 0 2 1 service-time 0 1 1 8:32 1 service-time 0 1 1 8:16 1]
Oct 18 14:19:58 sunhb65277 multipathd: 3600508b1001cbde9bbc03f7d66f637a7: ignoring map
Oct 18 14:19:58 sunhb65277 multipathd: 3600c0ff00012824b04af7a5201000000: event checker started
Oct 18 14:19:58 sunhb65277 kernel: [ 41.687108] device-mapper: table: 252:6: multipath: error getting device
Oct 18 14:19:58 sunhb65277 kernel: [ 41.687114] device-mapper: ioctl: error adding target to table
Oct 18 14:19:58 sunhb65277 kernel: [ 41.687756] device-mapper: table: 252:6: multipath: error getting device
Oct 18 14:19:58 sunhb65277 kernel: [ 41.687761] device-mapper: ioctl: error adding target to table
Oct 18 14:19:58 sunhb65277 multipathd: path checkers start up

Is the SAN not available ?

DLM seems to start fine on 10:

Code:
Oct 18 14:19:56 sunhb65277 kernel: [   40.102779] OCFS2 Node Manager 1.5.0
Oct 18 14:19:56 sunhb65277 kernel: [   40.106011] OCFS2 DLM 1.5.0
Oct 18 14:19:56 sunhb65277 kernel: [   40.106872] ocfs2: Registered cluster interface o2cb
Oct 18 14:19:56 sunhb65277 kernel: [   40.153028] OCFS2 DLMFS 1.5.0
Oct 18 14:19:56 sunhb65277 kernel: [   40.153244] OCFS2 User DLM kernel interface loaded
Oct 18 14:19:57 sunhb65277 kernel: [   40.243627] o2hb: Heartbeat mode set to local
Oct 18 14:19:57 sunhb65277 o2hbmonitor: Starting
/dlm is available.

After boot, without any intervention, everything seems fine:

Code:
sunhb65277:~ # multipathd -k
multipathd> list paths
hcil    dev dev_t pri dm_st  chk_st dev_st  next_check
7:0:0:0 sdb 8:16  10  active ready  running XXXXXXXX.. 17/20
8:0:0:0 sdc 8:32  50  active ready  running XXXXXXXX.. 17/20
0:0:0:0 sda 8:0   1   undef  ready  running orphan

multipathd> list maps
name                              sysfs uuid
3600c0ff00012824b04af7a5201000000 dm-5  3600c0ff00012824b04af7a5201000000
multipathd> list topology
create: 3600c0ff00012824b04af7a5201000000 dm-5 HP,P2000 G3 FC
size=1007G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 8:0:0:0 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 7:0:0:0 sdb 8:16 active ready running
multipathd>

sunhb65277:~ # dmsetup ls --tree
vg1-lv_srv (252:4)
 └─ (8:1)
3600c0ff00012824b04af7a5201000000 (252:5)
 ├─ (8:16)
 └─ (8:32)
vg1-lv_root_snapshot (252:3)
 ├─vg1-lv_root_snapshot-cow (252:2)
 │  └─ (8:1)
 └─vg1-lv_root-real (252:0)
    └─ (8:1)
vg1-lv_root (252:1)
 └─vg1-lv_root-real (252:0)
    └─ (8:1)
Mounting manually after boot works fine:

Code:
sunhb65277:~ # mount -a
sunhb65277:~ # mount
/dev/mapper/vg1-lv_root on / type ext3 (rw,strictatime,acl,user_xattr)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,mode=1777)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/sda3 on /boot type ext3 (rw,strictatime,acl,user_xattr)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
securityfs on /sys/kernel/security type securityfs (rw)
configfs on /sys/kernel/config type configfs (rw)
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
none on /var/lib/ntp/proc type proc (ro,nosuid,nodev)
/dev/dm-5 on /images type ocfs2 (rw,_netdev,heartbeat=local)
Any ideas ?

Bernd