PDA

View Full Version : Kernel Updates Breaks OCFS2



ucba
15-Jul-2013, 05:19
Running SLES 11 SP2 with latest patches (as of 15/7/2013), SLES HAE SP2 OCFS2 Version 1.6_3.0.13, XEN
5 Node Cluster OCFS2 gets broken by the latest kernel update.

All other SLES nodes running kernel 3.0.58
Updated one server to kernel 3.0.80 and ocfs2 will no longer mount


ocfs2 Internal logic failure while trying to join the group
Jul 15 12:48:44 sl-bne-hs23-01 kernel: [ 3496.524201] (mount.ocfs2,21615,11):o2hb_map_slot_data:1638 ERROR: status = -12
Jul 15 12:48:44 sl-bne-hs23-01 kernel: [ 3496.524208] (mount.ocfs2,21615,11):o2hb_region_dev_write:1768 ERROR: status = -12

By back rev'ing the kernel to 3.0.58, the system starts up again with no problems, ocfs2 mounts coming up correctly
Version 3.0.74 was not tested due to the need to get a live system back on line. No other patches had to be downgraded on the 5th server. Just the kernel-xen and the kernel-xen-base

This kernel version has been marked protected on our SLES cluster servers until a resolution is found.
Thanks
Eric

jmozdzen
15-Jul-2013, 09:37
Hi Eric,

I just checked the patch announcement, which doesn't mention "rolling updates". I for one would therefore expect that rolling updates are supported, but have asked my contacts at SUSE if I'm reading that correctly. I'll let you know once I've received a response.

Regards,
Jens

jmozdzen
15-Jul-2013, 10:47
(message content deleted by author)

jmozdzen
15-Jul-2013, 11:11
Hi Eric,

got my coffee, which made me notice:

- you only mention "3.0.80"
- there's been 3.0.80-0.5, but a 3.0.80-0.7.1, too. The latter was a patch against problems with the 0.5 version

Are you in a position to verify 3.0.80-0.7.1? There's no explicit mentioning of OCFS2 fixes (http://lists.opensuse.org/opensuse-security-announce/2013-07/msg00009.html), but it seems there were some important internal fixes included that may very well be worth testing.

Regards,
Jens

ucba
16-Jul-2013, 02:21
Hi Eric,

got my coffee, which made me notice:

- you only mention "3.0.80"
- there's been 3.0.80-0.5, but a 3.0.80-0.7.1, too. The latter was a patch against problems with the 0.5 version

Are you in a position to verify 3.0.80-0.7.1? There's no explicit mentioning of OCFS2 fixes (http://lists.opensuse.org/opensuse-security-announce/2013-07/msg00009.html), but it seems there were some important internal fixes included that may very well be worth testing.

Regards,
Jens

Hi Jens,

The full details of the kernel version that breaks ocfs2 is
3.0.80-0.5.1-x86_64 from SLES11-SP2

3.0.80-0.7.1 is not on our patch server as yet so no, I'm not in a position to test it. It's not part of SP3 is it?
Regards
Eric.

jmozdzen
16-Jul-2013, 09:22
Hi Eric,

I just double-checked to confirm yesterday's status: On my machines I see 3.0.80-7.1, we're running against our own SMT server which receives the packets from Novell:


v | SLES11-SP2-Updates | kernel-default | 3.0.58-0.6.2.1 | 3.0.80-0.7.1 | x86_64
v | SLES11-SP2-Updates | kernel-default-base | 3.0.58-0.6.2.1 | 3.0.80-0.7.1 | x86_64
v | SLES11-SP2-Updates | kernel-firmware | 20110923-0.17.1 | 20110923-0.19.21.10 | noarch
v | SLES11-SP2-Updates | kernel-xen | 3.0.58-0.6.2.1 | 3.0.80-0.7.1 | x86_64
v | SLES11-SP2-Updates | kernel-xen-base | 3.0.58-0.6.2.1 | 3.0.80-0.7.1 | x86_64

As you can see from above's "zypper lu" output, it's from the SLES11-SP2 repository, so no, I didn't confuse with SP3 - enough coffee this time ;)
Looking at the repository directory on our SMT server, I see that this update is two weeks old:


2190846669 20540 -rw-r--r-- 1 smt www 21031921 Jun 27 17:10 ./sle-11-x86_64/rpm/x86_64/kernel-xen-3.0.80-0.7.1.x86_64.rpm

So you might want to check your update sources.

Regards,
Jens

ucba
18-Jul-2013, 05:43
Okay, my SMT server says it's working but I suspect nobody is home. I've kiled the SMT service and started it again. A whole bunch of updates are starting to show up. It will be a while before I can get back to this and these are production machines. As soon as the opportunity avails itself, I'll try the later kernel once I have it.
Thanks
Eric.

jmozdzen
18-Jul-2013, 10:05
Hi Eric,

if the problem persists after applying the latest updates, please let me know (I'm monitoring this thread) so I can try to get some feedback from SUSE.

Regards,
Jens

ucba
02-Aug-2013, 01:45
Hi Eric,

if the problem persists after applying the latest updates, please let me know (I'm monitoring this thread) so I can try to get some feedback from SUSE.

Regards,
Jens

Hi Jens,

After resolving my SMT issue (For some reason a file was missing write rights) I now have the latest kernel patches. However, as these machines are production and other hardware issues have been resolved, they are now very stable. It's unlikely that I'm going to get the opportunity to test the later kernel. We are also preparing to move to SP3 which will mean that our whole virtual infrastructure will have to be updated at the same time. This is likely to be the next time the machines are taken down.

I can confirm from our test environment for SP3 that there is no problems with OCFS under SLES SP3 and HAE SP3.
If an opportunity arises to test the later SP2 kernel, I'll add an additional post. Otherwise, I'll have to leave this issue as resolved.
Thanks
Eric.

jmozdzen
02-Aug-2013, 11:15
Hi Eric,

thanks for reporting back that status. I'll let my back-end supports off the hook then ;)

> If an opportunity arises to test the later SP2 kernel, I'll add an additional post. Otherwise, I'll have to leave this issue as resolved.

If you find any problem, just let us know, we'll do our best to get things rolling again.

Regards,
Jens