Hello,

I have a quantity of SLES 11 SP1 servers that require multiple bonded
interfaces. In this example, eth0 and eth2 are bond0, and eth1 and eth3
are bond1. HP DL380 G7 using the onboard ports. There are two servers,
one is using balance-tlb bonding on both bonds, the other uses
balance-tlb on bond0 and active-backup on bond1.

FYI:

# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window
irtt Iface
10.20.210.0 0.0.0.0 255.255.255.0 U 0 0
0 bond0
10.20.200.0 0.0.0.0 255.255.255.0 U 0 0
0 bond1
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0
0 lo
0.0.0.0 10.20.210.1 0.0.0.0 UG 0 0
0 bond0

# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: transmit load balancing
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 50
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 44:1e:a1:02:18:f8

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 44:1e:a1:02:18:fc


The problem I'm seeing is that ping over the bond1 using balance-tlb
drops packets, and it's reproducible:

249 packets transmitted, 239 received, 4% packet loss, time 248005ms
rtt min/avg/max/mdev = 0.163/0.567/44.163/2.836 ms

When it occurs, it always drops 10 packets very near the beginning of
the test. It is reproducible IF you wait a while before you test again.

Oddly, I don't see any similar behavior on bond0, which uses
balance-tlb on both servers, nor do I have any issue if I change bond1
to use active-backup bonding.

Any thoughts are greatly appreciated.


--
cjhsa
------------------------------------------------------------------------
cjhsa's Profile: http://forums.novell.com/member.php?userid=94268
View this thread: http://forums.novell.com/showthread.php?t=451588