PDA

View Full Version : bond is deleted, but not created



tukoyi
27-Dec-2013, 11:57
Hi, I am using SuSE SP1 with kernel 2.6.16.46-0.12.

I meet some troubles, I have created bond0 with eth0 and eth1, and it is effective.

But, when I excute "service network restart", the system network will collapse as belowing output:

# ifdown bond0
bond0
--- blank......

/var/log/message:

Dec 27 17:56:12 MV-MM1 ifdown: bond0
Dec 27 17:56:12 MV-MM1 syslog-ng[6525]: Changing permissions on special file /dev/xconsole
Dec 27 17:56:12 MV-MM1 syslog-ng[6525]: Changing permissions on special file /dev/tty10
Dec 27 17:56:14 MV-MM1 kernel: bonding: bond0: released all slaves
Dec 27 17:56:14 MV-MM1 syslog-ng[6525]: io.c: do_write: write() failed (errno 22), Invalid argument
Dec 27 17:56:14 MV-MM1 kernel: klogd 1.4.1, ---------- state change ----------
Dec 27 17:56:14 MV-MM1 syslog-ng[6525]: Connection broken to AF_INET(10.25.125.10:514), reopening in 60 seconds
Dec 27 17:56:14 MV-MM1 kernel: bonding: bond0 is being deleted...
Dec 27 17:56:14 MV-MM1 kernel: LLT INFO V-14-1-10205 link 1 (bond0) node 1 in trouble
Dec 27 17:56:20 MV-MM1 kernel: LLT INFO V-14-1-10032 link 1 (bond0) node 1 inactive 8 sec (32767) <------VCS message, it is not important
Dec 27 17:56:21 MV-MM1 kernel: LLT INFO V-14-1-10032 link 1 (bond0) node 1 inactive 9 sec (32767)
Dec 27 17:56:22 MV-MM1 kernel: LLT INFO V-14-1-10032 link 1 (bond0) node 1 inactive 10 sec (32767)
Dec 27 17:56:23 MV-MM1 kernel: LLT INFO V-14-1-10032 link 1 (bond0) node 1 inactive 11 sec (32767)
Dec 27 17:56:24 MV-MM1 kernel: LLT INFO V-14-1-10032 link 1 (bond0) node 1 inactive 12 sec (32767)
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1
Dec 27 17:56:24 MV-MM1 kernel: unregister_netdevice: waiting for bond0 to become free. Usage count = 1

And I googled some workaround:

1) disable ipv6

2) upgrade kernel version

Unfortunately, the issue is still.

If I remove bond, just use eth0 or eth1, "service network restart" is very ok. And I am sure ifcfg-bon0 was configured right.

there is some strac when "service network restart" as bonding is made:

...........

8042 close(4) = 0
8042 exit_group(0) = ?
7971 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8042
7971 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
7971 --- SIGCHLD (Child exited) @ 0 (0) ---
7971 wait4(-1, 0x7fffaebe7674, WNOHANG, NULL) = -1 ECHILD (No child processes)
7971 rt_sigreturn(0xffffffffffffffff) = 0
7971 rt_sigaction(SIGINT, {0x43ffbb, [], SA_RESTORER, 0x2b46fc4b7c10}, {0x42e931, [], SA_RESTORER, 0x2b46fc4b7c10}, 8) = 0
7971 open("/sys/class/net/bonding_masters", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
7971 fcntl(1, F_GETFD) = 0
7971 fcntl(1, F_DUPFD, 10) = 10
7971 fcntl(1, F_GETFD) = 0
7971 fcntl(10, F_SETFD, FD_CLOEXEC) = 0
7971 close(1) = 0
7971 dup2(3, 1) = 1
7971 close(3) = 0
7971 write(1, "-bond0\n", 7 <---------------- breakpoint

Someone can help me ?

ab
27-Dec-2013, 13:01
On 12/27/2013 04:04 AM, tukoyi wrote:
>
> Hi, I am using SuSE SP1 with kernel 2.6.16.46-0.12.

I assume you specifically mean SUSE Linux Enterprise Server (SLES) 10 SP1,
since SUSE is a company and the 2.6.16 kernel sounds like SLES 10. That's
really old, and there are three SPs after SP1 that may help you. Is there
a reason you are not at least using SP4, if not SLES 11 which is already
up to SP3 as well?

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...

tukoyi
29-Dec-2013, 03:39
Hi, ab

Thank you for your response.

But I have updated the kernel version at kernel-default-2.6.16.60-0.54.5.x86_64 that newest kernel in SuSE SP4, and the trouble is still.

And, In my test environment, it is ok with SP1 kernel-default-2.6.16.46-0.12.x86_64.

Anyelse suggestions ?

ab
29-Dec-2013, 04:04
So it works in one environment and does not in (presumably, since you did
not state otherwise and implied as much) an identical production
environment. What is different about the two environments in terms of
hardware, software, even if involving other things like the switches
involved? If it works here, but not there, find out what is different.

I'd also go with the latest code which is SLES 11 SP3, but since you have
things working on another setup it may be easier for you to resolve the
differences instead despite SLE 10's age.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...