PDA

View Full Version : General protection fault



ArtRet
23-Mar-2013, 21:05
Hello everyone! Some time ago our server hung. I don't really understand what's the problem. Memtest and HDD benchmark didn't show any error (Inquisitor hardware testing platform).
There is a part of logfile:


Feb 21 03:00:10 node501 kernel: [1657361.355785] general protection fault: 0000 [#5] SMP
Feb 21 03:00:10 node501 kernel: [1657361.355801] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host
8/rport-8:0-1/target8:0:1/8:0:1:4/state
Feb 21 03:00:10 node501 kernel: [1657361.355809] CPU 25
Feb 21 03:00:10 node501 kernel: [1657361.355813] Modules linked in: nls_utf8 oracleacfs(PX) oracleadvm(PX) oracleoks(PX)
af_packet oracleasm(X) nfs lockd fscache nfs_acl auth_rpcgss sunrpc bonding cpufreq_conservative cpufreq_userspace cpuf
req_powersave acpi_cpufreq ib_ipoib ib_cm ipv6 ib_usa(N) ib_sa ib_uverbs ib_umad iw_nes crc32c libcrc32c iw_cxgb3 cxgb3
kcopy(N) mlx4_ib mlx4_core ib_mthca microcode fuse loop rds_tcp(N) rds(N) sr_mod cdrom ib_qib(N) tpm_tis qla2xxx ses shp
chp ib_mad tpm tpm_bios serio_raw igb pcspkr ib_core pci_hotplug dca enclosure scsi_transport_fc usb_storage joydev sg s
csi_tgt rtc_cmos rtc_core rtc_lib container button usbhid hid scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_rou
nd_robin dm_multipath scsi_dh ehci_hcd usbcore sd_mod crc_t10dif dm_snapshot dm_mod edd ext3 mbcache jbd fan processor a
acraid(N) ahci libata scsi_mod thermal thermal_sys hwmon
Feb 21 03:00:10 node501 kernel: [1657361.355921] Supported: No, Unsupported modules are loaded
Feb 21 03:00:10 node501 kernel: [1657361.355929] Pid: 17729, comm: emagent Tainted: P M D NX 2.6.32.12-0.7-default
#1 X9DRW
Feb 21 03:00:10 node501 kernel: [1657361.355936] RIP: 0010:[<ffffffff81313548>] [<ffffffff81313548>] netlink_autobind+0
x78/0xf0
Feb 21 03:00:10 node501 kernel: [1657361.355954] RSP: 0018:ffff88042391be68 EFLAGS: 00010086
Feb 21 03:00:10 node501 kernel: [1657361.355959] RAX: 90ff882039401000 RBX: ffffffff81cd2940 RCX: 00000000767248b7
Feb 21 03:00:10 node501 kernel: [1657361.355965] RDX: 00000000fc0ec4e4 RSI: 00000000f97134e7 RDI: ffff8820396a3000
Feb 21 03:00:10 node501 kernel: [1657361.355971] RBP: 0000000000002b49 R08: 0000000062ef2729 R09: ffff882035d3bcf0
Feb 21 03:00:10 node501 kernel: [1657361.355977] R10: 0000000000000a84 R11: ffffffff811a51c0 R12: ffff8820396a3000
Feb 21 03:00:10 node501 kernel: [1657361.355983] R13: ffff882034584800 R14: 00000000ffffefff R15: 0000000000000000
Feb 21 03:00:10 node501 kernel: [1657361.355989] FS: 00007fe20636e820(0000) GS:ffff8810b8920000(0000) knlGS:00000000000
00000
Feb 21 03:00:10 node501 kernel: [1657361.355995] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 21 03:00:10 node501 kernel: [1657361.356001] CR2: 00007f8c84006718 CR3: 00000014cb1e6000 CR4: 00000000000406e0
Feb 21 03:00:10 node501 kernel: [1657361.356007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 21 03:00:10 node501 kernel: [1657361.356013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 21 03:00:10 node501 kernel: [1657361.356019] Process emagent (pid: 17729, threadinfo ffff88042391a000, task ffff8804
2b7e6280)
Feb 21 03:00:10 node501 kernel: [1657361.356025] Stack:
Feb 21 03:00:10 node501 kernel: [1657361.356028] ffff8814a014bc00 ffff882034584800 ffffffff81cd2940 ffff88042391bec8
Feb 21 03:00:10 node501 kernel: [1657361.356035] <0> 00007fe2085a05e0 ffffffff813139fd 0000000000000000 ffff8814a014bc00
Feb 21 03:00:10 node501 kernel: [1657361.356043] <0> 000000000000000c 00007fe20636d640 00007fe2080026cd ffffffff812e236f
Feb 21 03:00:10 node501 kernel: [1657361.356053] Call Trace:
Feb 21 03:00:10 node501 kernel: [1657361.356071] [<ffffffff813139fd>] netlink_bind+0x7d/0x1f0
Feb 21 03:00:10 node501 kernel: [1657361.356083] [<ffffffff812e236f>] sys_bind+0xdf/0xf0
Feb 21 03:00:10 node501 kernel: [1657361.356097] [<ffffffff81002f7b>] system_call_fastpath+0x16/0x1b
Feb 21 03:00:10 node501 kernel: [1657361.356109] [<00007fe20cba0c07>] 0x7fe20cba0c07
Feb 21 03:00:10 node501 kernel: [1657361.356114] Code: e8 7e e7 ff ff 89 ee 4c 89 e7 e8 c4 d2 ff ff 48 8b 00 48 85 c0 75
14 eb 4a 66 2e 0f 1f 84 00 00 00 00 00 48 85 d2 74 3b 48 89 d0 <48> 3b 58 38 48 8b 10 0f 18 0a 75 ec 3b a8 50 02 00 00
75 e4 8b
Feb 21 03:00:10 node501 kernel: [1657361.356161] RIP [<ffffffff81313548>] netlink_autobind+0x78/0xf0

Could anybody say what's the problem? Thanks in advance.

smflood
24-Mar-2013, 00:57
ArtRet wrote:

> Hello everyone! Some time ago our server hung. I don't really understand
> what's the problem. Memtest and HDD benchmark didn't show any error
> (Inquisitor hardware testing platform).
> There is a part of logfile:
>
>
> Code:
> --------------------
> Feb 21 03:00:10 node501 kernel: [1657361.355785] general protection
> fault: 0000 [#5] SMP
> Feb 21 03:00:10 node501 kernel: [1657361.355801] last sysfs file:
> /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host
> 8/rport-8:0-1/target8:0:1/8:0:1:4/state
> Feb 21 03:00:10 node501 kernel: [1657361.355809] CPU 25
> Feb 21 03:00:10 node501 kernel: [1657361.355813] Modules linked in:
> nls_utf8 oracleacfs(PX) oracleadvm(PX) oracleoks(PX)
> af_packet oracleasm(X) nfs lockd fscache nfs_acl auth_rpcgss sunrpc
> bonding cpufreq_conservative cpufreq_userspace cpuf
> req_powersave acpi_cpufreq ib_ipoib ib_cm ipv6 ib_usa(N) ib_sa
> ib_uverbs ib_umad iw_nes crc32c libcrc32c iw_cxgb3 cxgb3
> kcopy(N) mlx4_ib mlx4_core ib_mthca microcode fuse loop rds_tcp(N)
> rds(N) sr_mod cdrom ib_qib(N) tpm_tis qla2xxx ses shp
> chp ib_mad tpm tpm_bios serio_raw igb pcspkr ib_core pci_hotplug dca
> enclosure scsi_transport_fc usb_storage joydev sg s
> csi_tgt rtc_cmos rtc_core rtc_lib container button usbhid hid
> scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_rou
> nd_robin dm_multipath scsi_dh ehci_hcd usbcore sd_mod crc_t10dif
> dm_snapshot dm_mod edd ext3 mbcache jbd fan processor a
> acraid(N) ahci libata scsi_mod thermal thermal_sys hwmon
> Feb 21 03:00:10 node501 kernel: [1657361.355921] Supported: No,
> Unsupported modules are loaded
> Feb 21 03:00:10 node501 kernel: [1657361.355929] Pid: 17729, comm:
> emagent Tainted: P M D NX 2.6.32.12-0.7-default
> #1 X9DRW
> Feb 21 03:00:10 node501 kernel: [1657361.355936] RIP:
> 0010:[<ffffffff81313548>] [<ffffffff81313548>] netlink_autobind+0
> x78/0xf0
> Feb 21 03:00:10 node501 kernel: [1657361.355954] RSP:
> 0018:ffff88042391be68 EFLAGS: 00010086
> Feb 21 03:00:10 node501 kernel: [1657361.355959] RAX: 90ff882039401000
> RBX: ffffffff81cd2940 RCX: 00000000767248b7
> Feb 21 03:00:10 node501 kernel: [1657361.355965] RDX: 00000000fc0ec4e4
> RSI: 00000000f97134e7 RDI: ffff8820396a3000
> Feb 21 03:00:10 node501 kernel: [1657361.355971] RBP: 0000000000002b49
> R08: 0000000062ef2729 R09: ffff882035d3bcf0
> Feb 21 03:00:10 node501 kernel: [1657361.355977] R10: 0000000000000a84
> R11: ffffffff811a51c0 R12: ffff8820396a3000
> Feb 21 03:00:10 node501 kernel: [1657361.355983] R13: ffff882034584800
> R14: 00000000ffffefff R15: 0000000000000000
> Feb 21 03:00:10 node501 kernel: [1657361.355989] FS:
> 00007fe20636e820(0000) GS:ffff8810b8920000(0000) knlGS:00000000000
> 00000
> Feb 21 03:00:10 node501 kernel: [1657361.355995] CS: 0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> Feb 21 03:00:10 node501 kernel: [1657361.356001] CR2: 00007f8c84006718
> CR3: 00000014cb1e6000 CR4: 00000000000406e0
> Feb 21 03:00:10 node501 kernel: [1657361.356007] DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> Feb 21 03:00:10 node501 kernel: [1657361.356013] DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Feb 21 03:00:10 node501 kernel: [1657361.356019] Process emagent (pid:
> 17729, threadinfo ffff88042391a000, task ffff8804
> 2b7e6280)
> Feb 21 03:00:10 node501 kernel: [1657361.356025] Stack:
> Feb 21 03:00:10 node501 kernel: [1657361.356028] ffff8814a014bc00
> ffff882034584800 ffffffff81cd2940 ffff88042391bec8
> Feb 21 03:00:10 node501 kernel: [1657361.356035] <0> 00007fe2085a05e0
> ffffffff813139fd 0000000000000000 ffff8814a014bc00
> Feb 21 03:00:10 node501 kernel: [1657361.356043] <0> 000000000000000c
> 00007fe20636d640 00007fe2080026cd ffffffff812e236f
> Feb 21 03:00:10 node501 kernel: [1657361.356053] Call Trace:
> Feb 21 03:00:10 node501 kernel: [1657361.356071] [<ffffffff813139fd>]
> netlink_bind+0x7d/0x1f0
> Feb 21 03:00:10 node501 kernel: [1657361.356083] [<ffffffff812e236f>] sys_bind+0xdf/0xf0
> Feb 21 03:00:10 node501 kernel: [1657361.356097] [<ffffffff81002f7b>]
> system_call_fastpath+0x16/0x1b
> Feb 21 03:00:10 node501 kernel: [1657361.356109] [<00007fe20cba0c07>] 0x7fe20cba0c07
> Feb 21 03:00:10 node501 kernel: [1657361.356114] Code: e8 7e e7 ff ff
> 89 ee 4c 89 e7 e8 c4 d2 ff ff 48 8b 00 48 85 c0 75
> 14 eb 4a 66 2e 0f 1f 84 00 00 00 00 00 48 85 d2 74 3b 48 89 d0 <48> 3b
> 58 38 48 8b 10 0f 18 0a 75 ec 3b a8 50 02 00 00
> 75 e4 8b
> Feb 21 03:00:10 node501 kernel: [1657361.356161] RIP
> [<ffffffff81313548>] netlink_autobind+0x78/0xf0
> --------------------
>
>
> Could anybody say what's the problem? Thanks in advance.

Which version of SUSE Linux Enterprise Server are you using? What does "cat
/etc/*release" produce?
How up-to-date are you with patches?
Which kernel version are using? "rpm -qa | grep kernel"

Is this server hosting Oracle? If so, which version?

HTH.
--
Simon
SUSE Knowledge Partner

ArtRet
25-Mar-2013, 14:00
Sorry for delay.
cat /etc/*release:

LSB_VERSION="core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64"
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 1
rpm -qa | grep kernel:

kernel-default-devel-2.6.32.12-0.7.1
kernel-mft-2.7.1-2.6.32.12_0.7_default
kernel-source-2.6.32.12-0.7.1
kernel-ib-1.5.4.1-2.6.32.12_0.7_default
kernel-default-base-2.6.32.12-0.7.1
kernel-ib-devel-1.5.4.1-2.6.32.12_0.7_default
linux-kernel-headers-2.6.32-1.4.13
kernel-default-2.6.32.12-0.7.1
Yes, this server is hosting Oracle Database Standard Edition 11.2.0.3.

I don't really know if patches are up-to-date.

ArtRet
02-Apr-2013, 07:53
It seems the problem was in number of huge pages.