PDA

View Full Version : SLES 11 SP1 and OCFS2 problems



Andre Massena
24-Nov-2011, 09:56
Hello all,

we are running SLES 11 (SP1) for zSeries on a z9 running under z/VM
using a mixture of CKD disks (OS only) and SAN (FC - zfcp) disk for the
OCFS2 database(s).

Our OCFS2 unvironment is as follows:


node1:~ # rpm -qa|grep ocfs2
ocfs2-kmp-xen-1.4_2.6.32.12_0.6-4.10.14
ocfs2-tools-1.4.3-0.11.20
ocfs2-tools-debuginfo-1.4.3-0.11.20
ocfs2-kmp-pae-1.4_2.6.32.12_0.6-4.10.14
ocfs2-tools-debugsource-1.4.3-0.11.20
ocfs2console-1.4.3-0.11.20
ocfs2-tools-o2cb-1.4.3-0.11.20

The OCFS2 filesystem is created using "mkfs.ocfs2 -N 8 <share disk
device name>" without issue any specific options i.e. default.

The mount options we are using are all default :

type ocfs2 (rw,_netdev,cluster_stack=pcmk)

We are trying to copy (cp) a 100GB file from a local ext3 file system
to a shared file system created on OCFS2.

Immediately on starting the copy, we are seeing this -


kernel BUG at
/usr/src/packages/BUILD/ocfs2-1.4/default/ocfs2/heartbeat.c:68!
illegal operation: 0001 [#1] SMP
Modules linked in: iptable_filter ip_tables x_tables sr_mod cdrom
af_packet ocf
Supported: Yes
CPU: 3 Not tainted 2.6.32.12-0.7-default #1
Process ocfs2_controld. (pid: 5783, task: 000000007424e438, ksp:
00000000760e38
Krnl PSW : 0704000180000000 000003c005899f0c
(ocfs2_do_node_down+0xc0/0xc4 [ocf
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
Krnl GPRS: 0000000000000038 000003c005899e4c 0000000000000001
000000007e448000
0000000000000000 000000000000001f 00000000760e3e1e 000000000000002f
000000007327d240 0000000000000001 0000000000000001 000000007e448000
000003c005854000 000003c0058f81f8 00000000760e3d70 00000000760e3d08
Krnl Code: 000003c005899efe: c0e5fffdd15f brasl %r14,3c0058541bc
000003c005899f04: a7f4ffce brc 15,3c005899ea0
000003c005899f08: a7f40001 brc 15,3c005899f0a
>000003c005899f0c: a7f40000 brc 15,3c005899f0c
000003c005899f10: a7180000 lhi %r1,0
000003c005899f14: 501020a8 st %r1,168(%r2)
000003c005899f18: a7180100 lhi %r1,256
000003c005899f1c: 40102850 sth %r1,2128(%r2)
Call Trace:
(<00000000760e3d00> 0x760e3d00)
<000003c004fa5ed8> ocfs2_control_write+0x3b0/0x4dc [ocfs2_stack_user]
<000000000020c160> vfs_write+0xac/0x1a4
<000000000020c354> SyS_write+0x58/0xb4
<0000000000117e7e> sysc_noemu+0x10/0x16
<000002000024ea50> 0x2000024ea50
Last Breaking-Event-Address:
<000003c005899f08> ocfs2_do_node_down+0xbc/0xc4 [ocfs2]
---[ end trace b0ce64c3e38cc8f4 ]---
Unable to handle kernel pointer dereference at virtual kernel address
000000060
Oops: 003b [#2] SMP
Modules linked in: iptable_filter ip_tables x_tables sr_mod cdrom
af_packet ocf
Supported: Yes
CPU: 3 Tainted: G D 2.6.32.12-0.7-default #1
Process ocfs2_controld. (pid: 5783, task: 000000007424e438, ksp:
00000000760e38
Krnl PSW : 0704200180000000 000000000030aba4 (kref_put+0x3c/0x88)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
Krnl GPRS: 000003e000000003 0000000000000001 0000000600000008
00000000002d71f0
0000000600000008 0000000000000003 0000000000000000 000000000000002f
000000007e9e7400 0000000074974d40 00000000740bc108 000000007cf2e300
000000007cca0e20 000000000047a428 00000000760e3870 00000000760e3850
Krnl Code: 000000000030ab96: e330d0000020 cg %r3,0(%r13)
000000000030ab9c: a7840022 brc 8,30abe0
000000000030aba0: a7180001 lhi %r1,1
>000000000030aba4: 58504000 l %r5,0(%r4)
000000000030aba8: 1825 lr %r2,%r5
000000000030abaa: 1b21 sr %r2,%r1
000000000030abac: ba524000 cs %r5,%r2,0(%r4)
000000000030abb0: a744fffc brc 4,30aba8
Call Trace:
(<00000000760e3898> 0x760e3898)
<00000000002da89c> apparmor_file_free_security+0x40/0x58
<000000000020cf6a> __fput+0x112/0x240
<00000000001e2690> remove_vma+0x60/0xa0
<00000000001e286c> exit_mmap+0x19c/0x2d8
<000000000013f2e2> mmput+0x62/0x150
<0000000000144e4a> exit_mm+0x196/0x1bc
<00000000001470a8> do_exit+0x12c/0x364
<0000000000105e2e> die+0x17a/0x17c
<000000000010768a> illegal_op+0x1f2/0x1f8
<0000000000117e84> sysc_return+0x0/0x8
<000003c005899f0c> ocfs2_do_node_down+0xc0/0xc4 [ocfs2]
(<00000000760e3d00> 0x760e3d00)
<000003c004fa5ed8> ocfs2_control_write+0x3b0/0x4dc [ocfs2_stack_user]
<000000000020c160> vfs_write+0xac/0x1a4
<000000000020c354> SyS_write+0x58/0xb4
<0000000000117e7e> sysc_noemu+0x10/0x16
<000002000024ea50> 0x2000024ea50
Last Breaking-Event-Address:
<00000000002da896> apparmor_file_free_security+0x3a/0x58
---[ end trace b0ce64c3e38cc8f5 ]---
Fixing recursive fault but reboot is needed!
Nov 10 10:38:25 LXCPOA ocfs2_controld[5783]: this node is not in the
ocfs2_cont
Nov 10 10:38:25 LXCPOA kernel: illegal operation: 0001 [#1] SMP
Nov 10 10:38:25 LXCPOA kernel: Modules linked in: iptable_filter
ip_tables x_ta
Nov 10 10:38:25 LXCPOA kernel: Supported: Yes
Nov 10 10:38:25 LXCPOA kernel: CPU: 3 Not tainted 2.6.32.12-0.7-default
#1
Nov 10 10:38:25 LXCPOA kernel: Process ocfs2_controld. (pid: 5783,
task: 000000
Nov 10 10:38:25 LXCPOA kernel: Krnl PSW : 0704000180000000
000003c005899f0c (oc
Nov 10 10:38:25 LXCPOA kernel: R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 A
Nov 10 10:38:25 LXCPOA kernel: Krnl GPRS: 0000000000000038
000003c005899e4c 000
Nov 10 10:38:25 LXCPOA kernel: 0000000000000000 000000000000001f 000
Nov 10 10:38:25 LXCPOA kernel: 000000007327d240 0000000000000001 000
Nov 10 10:38:25 LXCPOA kernel: 000003c005854000 000003c0058f81f8 000
Nov 10 10:38:25 LXCPOA kernel: Krnl Code: 000003c005899efe:
c0e5fffdd15f br
Nov 10 10:38:25 LXCPOA kernel: 000003c005899f04: a7f4ffce b
Nov 10 10:38:25 LXCPOA kernel: 000003c005899f08: a7f40001
Nov 10 10:38:25 LXCPOA kernel: >000003c005899f0c: a7f40000
Nov 10 10:38:25 LXCPOA kernel: 000003c005899f10: a7180000
Nov 10 10:38:25 LXCPOA kernel: 000003c005899f14: 501020a8
Nov 10 10:38:25 LXCPOA kernel: 000003c005899f18: a7180100
Nov 10 10:38:25 LXCPOA kernel: 000003c005899f1c: 40102850
Nov 10 10:38:25 LXCPOA kernel: Call Trace:
Nov 10 10:38:25 LXCPOA kernel: (<00000000760e3d00> 0x760e3d00)
Nov 10 10:38:25 LXCPOA kernel: <000003c004fa5ed8>
ocfs2_control_write+0x3b0/
Nov 10 10:38:25 LXCPOA kernel: <000000000020c160> vfs_write+0xac/0x1a4
Nov 10 10:38:25 LXCPOA kernel: <000000000020c354> SyS_write+0x58/0xb4
Nov 10 10:38:25 LXCPOA kernel: <0000000000117e7e> sysc_noemu+0x10/0x16
Nov 10 10:38:25 LXCPOA kernel: <000002000024ea50> 0x2000024ea50
Nov 10 10:38:25 LXCPOA kernel: Last Breaking-Event-Address:
Nov 10 10:38:25 LXCPOA kernel: <000003c005899f08>
ocfs2_do_node_down+0xbc/0x
Nov 10 10:38:26 LXCPOA kernel:
Nov 10 10:38:26 LXCPOA kernel: ---[ end trace b0ce64c3e38cc8f4 ]---
Nov 10 10:38:26 LXCPOA kernel: Oops: 003b [#2] SMP
Nov 10 10:38:26 LXCPOA kernel: Modules linked in: iptable_filter
ip_tables x_ta
Nov 10 10:38:26 LXCPOA kernel: Supported: Yes
Nov 10 10:38:26 LXCPOA kernel: CPU: 3 Tainted: G D 2.6.32.12-0.7-defa
Nov 10 10:38:26 LXCPOA kernel: Process ocfs2_controld. (pid: 5783,
task: 000000
Nov 10 10:38:26 LXCPOA kernel: Krnl PSW : 0704200180000000
000000000030aba4 (kr
Nov 10 10:38:26 LXCPOA kernel: R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 A
Nov 10 10:38:26 LXCPOA kernel: Krnl GPRS: 000003e000000003
0000000000000001 000
Nov 10 10:38:26 LXCPOA kernel: 0000000600000008 0000000000000003 000
Nov 10 10:38:26 LXCPOA kernel: 000000007e9e7400 0000000074974d40 000
Nov 10 10:38:26 LXCPOA kernel: 000000007cca0e20 000000000047a428 000
Nov 10 10:38:26 LXCPOA kernel: Krnl Code: 000000000030ab96:
e330d0000020 cg
Nov 10 10:38:26 LXCPOA kernel: 000000000030ab9c: a7840022
Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined retry
Nov 10 10:38:26 LXCPOA kernel: 000000000030aba0: a7180001
Nov 10 10:38:26 LXCPOA kernel: >000000000030aba4: 58504000
Nov 10 10:38:26 LXCPOA kernel: 000000000030aba8: 1825 lr
Nov 10 10:38:26 LXCPOA kernel: 000000000030abaa: 1b21
Nov 10 10:38:26 LXCPOA kernel: 000000000030abac: ba524000
Nov 10 10:38:26 LXCPOA kernel: 000000000030abb0: a744fffc
Nov 10 10:38:26 LXCPOA kernel: Call Trace:
Nov 10 10:38:26 LXCPOA kernel: (<00000000760e3898> 0x760e3898)
Nov 10 10:38:26 LXCPOA kernel: <00000000002da89c>
apparmor_file_free_securit
Nov 10 10:38:26 LXCPOA kernel: <000000000020cf6a> __fput+0x112/0x240
Nov 10 10:38:26 LXCPOA kernel: <00000000001e2690> remove_vma+0x60/0xa0
Nov 10 10:38:26 LXCPOA kernel: <00000000001e286c>
exit_mmap+0x19c/0x2d8
Nov 10 10:38:26 LXCPOA kernel: <000000000013f2e2> mmput+0x62/0x150
Nov 10 10:38:26 LXCPOA kernel: <0000000000144e4a> exit_mm+0x196/0x1bc
Nov 10 10:38:26 LXCPOA kernel: <00000000001470a8> do_exit+0x12c/0x364
Nov 10 10:38:26 LXCPOA kernel: <0000000000105e2e> die+0x17a/0x17c
Nov 10 10:38:26 LXCPOA kernel: <000000000010768a>
illegal_op+0x1f2/0x1f8
Nov 10 10:38:26 LXCPOA kernel: <0000000000117e84> sysc_return+0x0/0x8
Nov 10 10:38:26 LXCPOA kernel: <000003c005899f0c>
ocfs2_do_node_down+0xc0/0x
Nov 10 10:38:26 LXCPOA kernel: (<00000000760e3d00> 0x760e3d00)
Nov 10 10:38:26 LXCPOA kernel: <000003c004fa5ed8>
ocfs2_control_write+0x3b0/
Nov 10 10:38:26 LXCPOA kernel: <000000000020c160> vfs_write+0xac/0x1a4
Nov 10 10:38:26 LXCPOA kernel: <000000000020c354> SyS_write+0x58/0xb4
Nov 10 10:38:26 LXCPOA kernel: <0000000000117e7e> sysc_noemu+0x10/0x16
Nov 10 10:38:26 LXCPOA kernel: <000002000024ea50> 0x2000024ea50
Nov 10 10:38:26 LXCPOA kernel: Last Breaking-Event-Address:
Nov 10 10:38:26 LXCPOA kernel: <00000000002da896>
apparmor_file_free_securit
Nov 10 10:38:26 LXCPOA kernel:
Nov 10 10:38:26 LXCPOA kernel: ---[ end trace b0ce64c3e38cc8f5 ]---
Nov 10 10:38:26 LXCPOA stonith: external/sbd device OK.
Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined retry
Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
cpg_mcast_joined error
Nov 10 10:38:42 LXCPOA stonith: external/sbd device OK.
Nov 10 10:38:58 LXCPOA stonith: external/sbd device OK.
Nov 10 10:39:14 LXCPOA stonith: external/sbd device OK.
Nov 10 10:39:31 LXCPOA stonith: external/sbd device OK.
Nov 10 10:39:47 LXCPOA stonith: external/sbd device OK.
Nov 10 10:40:03 LXCPOA stonith: external/sbd device OK.
Nov 10 10:40:20 LXCPOA stonith: external/sbd device OK.
Nov 10 10:40:36 LXCPOA stonith: external/sbd device OK.
Nov 10 10:40:52 LXCPOA stonith: external/sbd device OK.
Nov 10 10:41:08 LXCPOA stonith: external/sbd device OK.
Nov 10 10:41:24 LXCPOA stonith: external/sbd device OK.
Nov 10 10:41:41 LXCPOA stonith: external/sbd device OK.
Nov 10 10:41:57 LXCPOA stonith: external/sbd device OK.
Nov 10 10:42:13 LXCPOA stonith: external/sbd device OK.
Nov 10 10:42:29 LXCPOA stonith: external/sbd device OK.
Nov 10 10:42:46 LXCPOA stonith: external/sbd device OK.
Nov 10 10:43:02 LXCPOA stonith: external/sbd device OK.
Nov 10 10:43:18 LXCPOA stonith: external/sbd device OK.
Nov 10 10:43:34 LXCPOA stonith: external/sbd device OK.
Nov 10 10:43:50 LXCPOA stonith: external/sbd device OK.
Nov 10 10:44:06 LXCPOA stonith: external/sbd device OK.
Nov 10 10:44:22 LXCPOA stonith: external/sbd device OK.
Nov 10 10:44:39 LXCPOA stonith: external/sbd device OK.
Nov 10 10:44:55 LXCPOA stonith: external/sbd device OK.
Nov 10 10:45:11 LXCPOA stonith: external/sbd device OK.
Nov 10 10:45:27 LXCPOA stonith: external/sbd device OK.
Nov 10 10:45:43 LXCPOA stonith: external/sbd device OK.
Nov 10 10:46:00 LXCPOA stonith: external/sbd device OK.
Nov 10 10:46:16 LXCPOA stonith: external/sbd device OK.
Nov 10 10:46:32 LXCPOA stonith: external/sbd device OK.
Nov 10 10:46:48 LXCPOA stonith: external/sbd device OK.
Nov 10 10:47:04 LXCPOA stonith: external/sbd device OK.
Nov 10 10:47:20 LXCPOA stonith: external/sbd device OK.
Nov 10 10:47:37 LXCPOA stonith: external/sbd device OK.
Nov 10 10:47:53 LXCPOA stonith: external/sbd device OK.
Nov 10 10:48:09 LXCPOA stonith: external/sbd device OK.
Nov 10 10:48:25 LXCPOA stonith: external/sbd device OK.
Nov 10 10:48:41 LXCPOA stonith: external/sbd device OK.
Nov 10 10:48:57 LXCPOA stonith: external/sbd device OK.
Nov 10 10:50:02 LXCPOA crmd: [5683]: ERROR: process_lrm_event: LRM
operation sb
Nov 10 10:51:08 LXCPOA crmd: [5683]: ERROR: process_lrm_event: LRM
operation sb
Nov 10 10:51:48 LXCPOA crmd: [5683]: ERROR: process_lrm_event: LRM
operation sb

A helpful tip and / or idea would be most appreciated.


J.


--
Andre_Massena
------------------------------------------------------------------------
Andre_Massena's Profile: http://forums.novell.com/member.php?userid=120228
View this thread: http://forums.novell.com/showthread.php?t=448661

Automatic Reply
29-Nov-2011, 21:33
Andre,

It appears that in the past few days you have not received a response to your
posting. That concerns us, and has triggered this automated reply.

Has your problem been resolved? If not, you might try one of the following options:

- Visit http://support.novell.com and search the knowledgebase and/or check all
the other self support options and support programs available.
- You could also try posting your message again. Make sure it is posted in the
correct newsgroup. (http://forums.novell.com)

Be sure to read the forum FAQ about what to expect in the way of responses:
http://forums.novell.com/faq.php

If this is a reply to a duplicate posting, please ignore and accept our apologies
and rest assured we will issue a stern reprimand to our posting bot.

Good luck!

Your Novell Product Support Forums Team
http://forums.novell.com/

magic31
30-Nov-2011, 19:56
Andre_Massena;2156165 Wrote:
> Hello all,
>
> we are running SLES 11 (SP1) for zSeries on a z9 running under z/VM
> using a mixture of CKD disks (OS only) and SAN (FC - zfcp) disk for the
> OCFS2 database(s).
>
> Our OCFS2 unvironment is as follows:
>
>
> node1:~ # rpm -qa|grep ocfs2
> ocfs2-kmp-xen-1.4_2.6.32.12_0.6-4.10.14
> ocfs2-tools-1.4.3-0.11.20
> ocfs2-tools-debuginfo-1.4.3-0.11.20
> ocfs2-kmp-pae-1.4_2.6.32.12_0.6-4.10.14
> ocfs2-tools-debugsource-1.4.3-0.11.20
> ocfs2console-1.4.3-0.11.20
> ocfs2-tools-o2cb-1.4.3-0.11.20
>
> The OCFS2 filesystem is created using "mkfs.ocfs2 -N 8 <share disk
> device name>" without issue any specific options i.e. default.
>
> The mount options we are using are all default :
>
> type ocfs2 (rw,_netdev,cluster_stack=pcmk)
>
> We are trying to copy (cp) a 100GB file from a local ext3 file system
> to a shared file system created on OCFS2.
>
> Immediately on starting the copy, we are seeing this -
>

Not really an idea (as I don't work with OCFS2)... but looking at the
output you posted, try turning off Apparmor on your cluster servers and
retry the copy. It has been known to cause issues with clustering.

-Willem


--
Novell Knowledge Partner (voluntary sysop)

It ain't anything like Harry Potter.. but you gotta love the magic IT
can bring to this world
------------------------------------------------------------------------
magic31's Profile: http://forums.novell.com/member.php?userid=2303
View this thread: http://forums.novell.com/showthread.php?t=448661