Results 1 to 3 of 3

Thread: SLES 11 SP1 and OCFS2 problems

Hybrid View

  1. #1
    Andre Massena NNTP User

    SLES 11 SP1 and OCFS2 problems


    Hello all,

    we are running SLES 11 (SP1) for zSeries on a z9 running under z/VM
    using a mixture of CKD disks (OS only) and SAN (FC - zfcp) disk for the
    OCFS2 database(s).

    Our OCFS2 unvironment is as follows:


    node1:~ # rpm -qa|grep ocfs2
    ocfs2-kmp-xen-1.4_2.6.32.12_0.6-4.10.14
    ocfs2-tools-1.4.3-0.11.20
    ocfs2-tools-debuginfo-1.4.3-0.11.20
    ocfs2-kmp-pae-1.4_2.6.32.12_0.6-4.10.14
    ocfs2-tools-debugsource-1.4.3-0.11.20
    ocfs2console-1.4.3-0.11.20
    ocfs2-tools-o2cb-1.4.3-0.11.20

    The OCFS2 filesystem is created using "mkfs.ocfs2 -N 8 <share disk
    device name>" without issue any specific options i.e. default.

    The mount options we are using are all default :

    type ocfs2 (rw,_netdev,cluster_stack=pcmk)

    We are trying to copy (cp) a 100GB file from a local ext3 file system
    to a shared file system created on OCFS2.

    Immediately on starting the copy, we are seeing this -


    kernel BUG at
    /usr/src/packages/BUILD/ocfs2-1.4/default/ocfs2/heartbeat.c:68!
    illegal operation: 0001 [#1] SMP
    Modules linked in: iptable_filter ip_tables x_tables sr_mod cdrom
    af_packet ocf
    Supported: Yes
    CPU: 3 Not tainted 2.6.32.12-0.7-default #1
    Process ocfs2_controld. (pid: 5783, task: 000000007424e438, ksp:
    00000000760e38
    Krnl PSW : 0704000180000000 000003c005899f0c
    (ocfs2_do_node_down+0xc0/0xc4 [ocf
    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
    Krnl GPRS: 0000000000000038 000003c005899e4c 0000000000000001
    000000007e448000
    0000000000000000 000000000000001f 00000000760e3e1e 000000000000002f
    000000007327d240 0000000000000001 0000000000000001 000000007e448000
    000003c005854000 000003c0058f81f8 00000000760e3d70 00000000760e3d08
    Krnl Code: 000003c005899efe: c0e5fffdd15f brasl %r14,3c0058541bc
    000003c005899f04: a7f4ffce brc 15,3c005899ea0
    000003c005899f08: a7f40001 brc 15,3c005899f0a
    >000003c005899f0c: a7f40000 brc 15,3c005899f0c

    000003c005899f10: a7180000 lhi %r1,0
    000003c005899f14: 501020a8 st %r1,168(%r2)
    000003c005899f18: a7180100 lhi %r1,256
    000003c005899f1c: 40102850 sth %r1,2128(%r2)
    Call Trace:
    (<00000000760e3d00> 0x760e3d00)
    <000003c004fa5ed8> ocfs2_control_write+0x3b0/0x4dc [ocfs2_stack_user]
    <000000000020c160> vfs_write+0xac/0x1a4
    <000000000020c354> SyS_write+0x58/0xb4
    <0000000000117e7e> sysc_noemu+0x10/0x16
    <000002000024ea50> 0x2000024ea50
    Last Breaking-Event-Address:
    <000003c005899f08> ocfs2_do_node_down+0xbc/0xc4 [ocfs2]
    ---[ end trace b0ce64c3e38cc8f4 ]---
    Unable to handle kernel pointer dereference at virtual kernel address
    000000060
    Oops: 003b [#2] SMP
    Modules linked in: iptable_filter ip_tables x_tables sr_mod cdrom
    af_packet ocf
    Supported: Yes
    CPU: 3 Tainted: G D 2.6.32.12-0.7-default #1
    Process ocfs2_controld. (pid: 5783, task: 000000007424e438, ksp:
    00000000760e38
    Krnl PSW : 0704200180000000 000000000030aba4 (kref_put+0x3c/0x88)
    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
    Krnl GPRS: 000003e000000003 0000000000000001 0000000600000008
    00000000002d71f0
    0000000600000008 0000000000000003 0000000000000000 000000000000002f
    000000007e9e7400 0000000074974d40 00000000740bc108 000000007cf2e300
    000000007cca0e20 000000000047a428 00000000760e3870 00000000760e3850
    Krnl Code: 000000000030ab96: e330d0000020 cg %r3,0(%r13)
    000000000030ab9c: a7840022 brc 8,30abe0
    000000000030aba0: a7180001 lhi %r1,1
    >000000000030aba4: 58504000 l %r5,0(%r4)

    000000000030aba8: 1825 lr %r2,%r5
    000000000030abaa: 1b21 sr %r2,%r1
    000000000030abac: ba524000 cs %r5,%r2,0(%r4)
    000000000030abb0: a744fffc brc 4,30aba8
    Call Trace:
    (<00000000760e3898> 0x760e3898)
    <00000000002da89c> apparmor_file_free_security+0x40/0x58
    <000000000020cf6a> __fput+0x112/0x240
    <00000000001e2690> remove_vma+0x60/0xa0
    <00000000001e286c> exit_mmap+0x19c/0x2d8
    <000000000013f2e2> mmput+0x62/0x150
    <0000000000144e4a> exit_mm+0x196/0x1bc
    <00000000001470a8> do_exit+0x12c/0x364
    <0000000000105e2e> die+0x17a/0x17c
    <000000000010768a> illegal_op+0x1f2/0x1f8
    <0000000000117e84> sysc_return+0x0/0x8
    <000003c005899f0c> ocfs2_do_node_down+0xc0/0xc4 [ocfs2]
    (<00000000760e3d00> 0x760e3d00)
    <000003c004fa5ed8> ocfs2_control_write+0x3b0/0x4dc [ocfs2_stack_user]
    <000000000020c160> vfs_write+0xac/0x1a4
    <000000000020c354> SyS_write+0x58/0xb4
    <0000000000117e7e> sysc_noemu+0x10/0x16
    <000002000024ea50> 0x2000024ea50
    Last Breaking-Event-Address:
    <00000000002da896> apparmor_file_free_security+0x3a/0x58
    ---[ end trace b0ce64c3e38cc8f5 ]---
    Fixing recursive fault but reboot is needed!
    Nov 10 10:38:25 LXCPOA ocfs2_controld[5783]: this node is not in the
    ocfs2_cont
    Nov 10 10:38:25 LXCPOA kernel: illegal operation: 0001 [#1] SMP
    Nov 10 10:38:25 LXCPOA kernel: Modules linked in: iptable_filter
    ip_tables x_ta
    Nov 10 10:38:25 LXCPOA kernel: Supported: Yes
    Nov 10 10:38:25 LXCPOA kernel: CPU: 3 Not tainted 2.6.32.12-0.7-default
    #1
    Nov 10 10:38:25 LXCPOA kernel: Process ocfs2_controld. (pid: 5783,
    task: 000000
    Nov 10 10:38:25 LXCPOA kernel: Krnl PSW : 0704000180000000
    000003c005899f0c (oc
    Nov 10 10:38:25 LXCPOA kernel: R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 A
    Nov 10 10:38:25 LXCPOA kernel: Krnl GPRS: 0000000000000038
    000003c005899e4c 000
    Nov 10 10:38:25 LXCPOA kernel: 0000000000000000 000000000000001f 000
    Nov 10 10:38:25 LXCPOA kernel: 000000007327d240 0000000000000001 000
    Nov 10 10:38:25 LXCPOA kernel: 000003c005854000 000003c0058f81f8 000
    Nov 10 10:38:25 LXCPOA kernel: Krnl Code: 000003c005899efe:
    c0e5fffdd15f br
    Nov 10 10:38:25 LXCPOA kernel: 000003c005899f04: a7f4ffce b
    Nov 10 10:38:25 LXCPOA kernel: 000003c005899f08: a7f40001
    Nov 10 10:38:25 LXCPOA kernel: >000003c005899f0c: a7f40000
    Nov 10 10:38:25 LXCPOA kernel: 000003c005899f10: a7180000
    Nov 10 10:38:25 LXCPOA kernel: 000003c005899f14: 501020a8
    Nov 10 10:38:25 LXCPOA kernel: 000003c005899f18: a7180100
    Nov 10 10:38:25 LXCPOA kernel: 000003c005899f1c: 40102850
    Nov 10 10:38:25 LXCPOA kernel: Call Trace:
    Nov 10 10:38:25 LXCPOA kernel: (<00000000760e3d00> 0x760e3d00)
    Nov 10 10:38:25 LXCPOA kernel: <000003c004fa5ed8>
    ocfs2_control_write+0x3b0/
    Nov 10 10:38:25 LXCPOA kernel: <000000000020c160> vfs_write+0xac/0x1a4
    Nov 10 10:38:25 LXCPOA kernel: <000000000020c354> SyS_write+0x58/0xb4
    Nov 10 10:38:25 LXCPOA kernel: <0000000000117e7e> sysc_noemu+0x10/0x16
    Nov 10 10:38:25 LXCPOA kernel: <000002000024ea50> 0x2000024ea50
    Nov 10 10:38:25 LXCPOA kernel: Last Breaking-Event-Address:
    Nov 10 10:38:25 LXCPOA kernel: <000003c005899f08>
    ocfs2_do_node_down+0xbc/0x
    Nov 10 10:38:26 LXCPOA kernel:
    Nov 10 10:38:26 LXCPOA kernel: ---[ end trace b0ce64c3e38cc8f4 ]---
    Nov 10 10:38:26 LXCPOA kernel: Oops: 003b [#2] SMP
    Nov 10 10:38:26 LXCPOA kernel: Modules linked in: iptable_filter
    ip_tables x_ta
    Nov 10 10:38:26 LXCPOA kernel: Supported: Yes
    Nov 10 10:38:26 LXCPOA kernel: CPU: 3 Tainted: G D 2.6.32.12-0.7-defa
    Nov 10 10:38:26 LXCPOA kernel: Process ocfs2_controld. (pid: 5783,
    task: 000000
    Nov 10 10:38:26 LXCPOA kernel: Krnl PSW : 0704200180000000
    000000000030aba4 (kr
    Nov 10 10:38:26 LXCPOA kernel: R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 A
    Nov 10 10:38:26 LXCPOA kernel: Krnl GPRS: 000003e000000003
    0000000000000001 000
    Nov 10 10:38:26 LXCPOA kernel: 0000000600000008 0000000000000003 000
    Nov 10 10:38:26 LXCPOA kernel: 000000007e9e7400 0000000074974d40 000
    Nov 10 10:38:26 LXCPOA kernel: 000000007cca0e20 000000000047a428 000
    Nov 10 10:38:26 LXCPOA kernel: Krnl Code: 000000000030ab96:
    e330d0000020 cg
    Nov 10 10:38:26 LXCPOA kernel: 000000000030ab9c: a7840022
    Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined retry
    Nov 10 10:38:26 LXCPOA kernel: 000000000030aba0: a7180001
    Nov 10 10:38:26 LXCPOA kernel: >000000000030aba4: 58504000
    Nov 10 10:38:26 LXCPOA kernel: 000000000030aba8: 1825 lr
    Nov 10 10:38:26 LXCPOA kernel: 000000000030abaa: 1b21
    Nov 10 10:38:26 LXCPOA kernel: 000000000030abac: ba524000
    Nov 10 10:38:26 LXCPOA kernel: 000000000030abb0: a744fffc
    Nov 10 10:38:26 LXCPOA kernel: Call Trace:
    Nov 10 10:38:26 LXCPOA kernel: (<00000000760e3898> 0x760e3898)
    Nov 10 10:38:26 LXCPOA kernel: <00000000002da89c>
    apparmor_file_free_securit
    Nov 10 10:38:26 LXCPOA kernel: <000000000020cf6a> __fput+0x112/0x240
    Nov 10 10:38:26 LXCPOA kernel: <00000000001e2690> remove_vma+0x60/0xa0
    Nov 10 10:38:26 LXCPOA kernel: <00000000001e286c>
    exit_mmap+0x19c/0x2d8
    Nov 10 10:38:26 LXCPOA kernel: <000000000013f2e2> mmput+0x62/0x150
    Nov 10 10:38:26 LXCPOA kernel: <0000000000144e4a> exit_mm+0x196/0x1bc
    Nov 10 10:38:26 LXCPOA kernel: <00000000001470a8> do_exit+0x12c/0x364
    Nov 10 10:38:26 LXCPOA kernel: <0000000000105e2e> die+0x17a/0x17c
    Nov 10 10:38:26 LXCPOA kernel: <000000000010768a>
    illegal_op+0x1f2/0x1f8
    Nov 10 10:38:26 LXCPOA kernel: <0000000000117e84> sysc_return+0x0/0x8
    Nov 10 10:38:26 LXCPOA kernel: <000003c005899f0c>
    ocfs2_do_node_down+0xc0/0x
    Nov 10 10:38:26 LXCPOA kernel: (<00000000760e3d00> 0x760e3d00)
    Nov 10 10:38:26 LXCPOA kernel: <000003c004fa5ed8>
    ocfs2_control_write+0x3b0/
    Nov 10 10:38:26 LXCPOA kernel: <000000000020c160> vfs_write+0xac/0x1a4
    Nov 10 10:38:26 LXCPOA kernel: <000000000020c354> SyS_write+0x58/0xb4
    Nov 10 10:38:26 LXCPOA kernel: <0000000000117e7e> sysc_noemu+0x10/0x16
    Nov 10 10:38:26 LXCPOA kernel: <000002000024ea50> 0x2000024ea50
    Nov 10 10:38:26 LXCPOA kernel: Last Breaking-Event-Address:
    Nov 10 10:38:26 LXCPOA kernel: <00000000002da896>
    apparmor_file_free_securit
    Nov 10 10:38:26 LXCPOA kernel:
    Nov 10 10:38:26 LXCPOA kernel: ---[ end trace b0ce64c3e38cc8f5 ]---
    Nov 10 10:38:26 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined retry
    Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:26 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:27 LXCPOA cluster-dlm[5740]: _send_message:
    cpg_mcast_joined error
    Nov 10 10:38:42 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:38:58 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:39:14 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:39:31 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:39:47 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:40:03 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:40:20 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:40:36 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:40:52 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:41:08 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:41:24 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:41:41 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:41:57 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:42:13 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:42:29 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:42:46 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:43:02 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:43:18 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:43:34 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:43:50 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:44:06 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:44:22 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:44:39 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:44:55 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:45:11 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:45:27 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:45:43 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:46:00 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:46:16 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:46:32 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:46:48 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:47:04 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:47:20 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:47:37 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:47:53 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:48:09 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:48:25 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:48:41 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:48:57 LXCPOA stonith: external/sbd device OK.
    Nov 10 10:50:02 LXCPOA crmd: [5683]: ERROR: process_lrm_event: LRM
    operation sb
    Nov 10 10:51:08 LXCPOA crmd: [5683]: ERROR: process_lrm_event: LRM
    operation sb
    Nov 10 10:51:48 LXCPOA crmd: [5683]: ERROR: process_lrm_event: LRM
    operation sb

    A helpful tip and / or idea would be most appreciated.


    J.


    --
    Andre_Massena
    ------------------------------------------------------------------------
    Andre_Massena's Profile: http://forums.novell.com/member.php?userid=120228
    View this thread: http://forums.novell.com/showthread.php?t=448661


  2. #2
    Automatic Reply NNTP User

    Re: SLES 11 SP1 and OCFS2 problems

    Andre,

    It appears that in the past few days you have not received a response to your
    posting. That concerns us, and has triggered this automated reply.

    Has your problem been resolved? If not, you might try one of the following options:

    - Visit http://support.novell.com and search the knowledgebase and/or check all
    the other self support options and support programs available.
    - You could also try posting your message again. Make sure it is posted in the
    correct newsgroup. (http://forums.novell.com)

    Be sure to read the forum FAQ about what to expect in the way of responses:
    http://forums.novell.com/faq.php

    If this is a reply to a duplicate posting, please ignore and accept our apologies
    and rest assured we will issue a stern reprimand to our posting bot.

    Good luck!

    Your Novell Product Support Forums Team
    http://forums.novell.com/


  3. #3
    magic31 NNTP User

    Re: SLES 11 SP1 and OCFS2 problems


    Andre_Massena;2156165 Wrote:
    > Hello all,
    >
    > we are running SLES 11 (SP1) for zSeries on a z9 running under z/VM
    > using a mixture of CKD disks (OS only) and SAN (FC - zfcp) disk for the
    > OCFS2 database(s).
    >
    > Our OCFS2 unvironment is as follows:
    >
    >
    > node1:~ # rpm -qa|grep ocfs2
    > ocfs2-kmp-xen-1.4_2.6.32.12_0.6-4.10.14
    > ocfs2-tools-1.4.3-0.11.20
    > ocfs2-tools-debuginfo-1.4.3-0.11.20
    > ocfs2-kmp-pae-1.4_2.6.32.12_0.6-4.10.14
    > ocfs2-tools-debugsource-1.4.3-0.11.20
    > ocfs2console-1.4.3-0.11.20
    > ocfs2-tools-o2cb-1.4.3-0.11.20
    >
    > The OCFS2 filesystem is created using "mkfs.ocfs2 -N 8 <share disk
    > device name>" without issue any specific options i.e. default.
    >
    > The mount options we are using are all default :
    >
    > type ocfs2 (rw,_netdev,cluster_stack=pcmk)
    >
    > We are trying to copy (cp) a 100GB file from a local ext3 file system
    > to a shared file system created on OCFS2.
    >
    > Immediately on starting the copy, we are seeing this -
    >


    Not really an idea (as I don't work with OCFS2)... but looking at the
    output you posted, try turning off Apparmor on your cluster servers and
    retry the copy. It has been known to cause issues with clustering.

    -Willem


    --
    Novell Knowledge Partner (voluntary sysop)

    It ain't anything like Harry Potter.. but you gotta love the magic IT
    can bring to this world
    ------------------------------------------------------------------------
    magic31's Profile: http://forums.novell.com/member.php?userid=2303
    View this thread: http://forums.novell.com/showthread.php?t=448661


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •