PDA

View Full Version : SLES 12 SP1 when using nfsd - kernel BUG in 3.12.67-60.64.18-default



horiba
29-Nov-2016, 11:46
looks like there's a crash bug in the most recent kernel. Most Probably related to NFS.

rolling back to 3.12.62-60.64.8-default fixed the problem, users are grumpy now though.

So if you use NFS be aware.

Unfortunately I don't know if SuSE is already aware of this bug - I couldn't find its bugzilla and SRs are scarce...



2016-11-29T11:22:20.320676+01:00 plato kernel: [ 88.574649] kernel BUG at ../fs/dcache.c:268!
2016-11-29T11:22:20.320677+01:00 plato kernel: [ 88.575635] invalid opcode: 0000 [#1] SMP
2016-11-29T11:22:20.320678+01:00 plato kernel: [ 88.576622] Modules linked in: binfmt_misc mptctl mptbase tcp_diag inet_diag xt_pktty
2016-11-29T11:22:20.320680+01:00 plato kernel: [ 88.586386] Supported: Yes
2016-11-29T11:22:20.320693+01:00 plato kernel: [ 88.587491] CPU: 10 PID: 3863 Comm: nfsd Not tainted 3.12.67-60.64.18-default #1
2016-11-29T11:22:20.320694+01:00 plato kernel: [ 88.588597] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
2016-11-29T11:22:20.320695+01:00 plato kernel: [ 88.589698] task: ffff88042b449700 ti: ffff8800b7012000 task.ti: ffff8800b7012000
2016-11-29T11:22:20.320695+01:00 plato kernel: [ 88.590793] RIP: 0010:[<ffffffff8151e480>] [<ffffffff8151e480>] dentry_rcuwalk_barri
2016-11-29T11:22:20.320696+01:00 plato kernel: [ 88.591911] RSP: 0018:ffff8800b7013b10 EFLAGS: 00010246
2016-11-29T11:22:20.320697+01:00 plato kernel: [ 88.592993] RAX: 0000000000030003 RBX: ffff8803ade58e40 RCX: ffff8804146cf8c0
2016-11-29T11:22:20.320698+01:00 plato kernel: [ 88.594102] RDX: 0000000000000003 RSI: ffff8803adf05eb8 RDI: ffffffff82170434
2016-11-29T11:22:20.320699+01:00 plato kernel: [ 88.595219] RBP: ffff8804146cf8c0 R08: ffff8800b7013a98 R09: 66bc1adcb35c9ebb
2016-11-29T11:22:20.320699+01:00 plato kernel: [ 88.596351] R10: ffff880414705000 R11: 0000000000000000 R12: ffff880398896600
2016-11-29T11:22:20.320700+01:00 plato kernel: [ 88.597478] R13: ffff8803ade58e40 R14: ffff8804295be720 R15: ffff8803ade58e40
2016-11-29T11:22:20.320701+01:00 plato kernel: [ 88.598601] FS: 0000000000000000(0000) GS:ffff88043f540000(0000) knlGS:0000000000000
2016-11-29T11:22:20.320702+01:00 plato kernel: [ 88.599712] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2016-11-29T11:22:20.320702+01:00 plato kernel: [ 88.600793] CR2: 000000000069c530 CR3: 0000000001c0c000 CR4: 00000000000407e0
2016-11-29T11:22:20.320703+01:00 plato kernel: [ 88.601883] Stack:
2016-11-29T11:22:20.320704+01:00 plato kernel: [ 88.602952] ffffffff811be9ce ffff8803adf05da0 ffff8803ade58e40 ffffffff811c02db
2016-11-29T11:22:20.320704+01:00 plato kernel: [ 88.604067] ffff8804130e6b40 ffff880398896600 ffff8804130e6b40 ffff8804130e6b40
2016-11-29T11:22:20.320705+01:00 plato kernel: [ 88.605175] ffff8803ade58e40 ffff8804295be720 ffff8803ade58e40 ffffffff811b2549
2016-11-29T11:22:20.320706+01:00 plato kernel: [ 88.606292] Call Trace:
2016-11-29T11:22:20.320706+01:00 plato kernel: [ 88.607403] [<ffffffff811be9ce>] __d_drop+0xee/0xf0
2016-11-29T11:22:20.320707+01:00 plato kernel: [ 88.608510] [<ffffffff811c02db>] d_materialise_unique+0x25b/0x3d0
2016-11-29T11:22:20.320708+01:00 plato kernel: [ 88.609605] [<ffffffff811b2549>] lookup_real+0x19/0x50
2016-11-29T11:22:20.320709+01:00 plato kernel: [ 88.610696] [<ffffffff811b2e0f>] __lookup_hash+0x2f/0x40
2016-11-29T11:22:20.320709+01:00 plato kernel: [ 88.611802] [<ffffffff811b3c1d>] lookup_one_len+0xcd/0x120
2016-11-29T11:22:20.320710+01:00 plato kernel: [ 88.612926] [<ffffffff8121d7d2>] reconnect_path+0x1c2/0x2e0
2016-11-29T11:22:20.320711+01:00 plato kernel: [ 88.614058] [<ffffffff8121dc1f>] exportfs_decode_fh+0xef/0x2c0
2016-11-29T11:22:20.320711+01:00 plato kernel: [ 88.615194] [<ffffffffa05e5485>] fh_verify+0x2f5/0x5e0 [nfsd]
2016-11-29T11:22:20.320712+01:00 plato kernel: [ 88.616346] [<ffffffffa05f4cad>] nfsd4_proc_compound+0x55d/0x7b0 [nfsd]
2016-11-29T11:22:20.320713+01:00 plato kernel: [ 88.617522] [<ffffffffa05e1d22>] nfsd_dispatch+0xb2/0x200 [nfsd]
2016-11-29T11:22:20.320714+01:00 plato kernel: [ 88.618703] [<ffffffffa033df46>] svc_process_common+0x476/0x6e0 [sunrpc]
2016-11-29T11:22:20.320715+01:00 plato kernel: [ 88.619911] [<ffffffffa033e2bc>] svc_process+0x10c/0x160 [sunrpc]
2016-11-29T11:22:20.320715+01:00 plato kernel: [ 88.621113] [<ffffffffa05e16df>] nfsd+0xaf/0x120 [nfsd]
2016-11-29T11:22:20.320716+01:00 plato kernel: [ 88.622329] [<ffffffff8107b4c4>] kthread+0xb4/0xc0
2016-11-29T11:22:20.320717+01:00 plato kernel: [ 88.624019] [<ffffffff8152f158>] ret_from_fork+0x58/0x90
2016-11-29T11:22:20.320718+01:00 plato kernel: [ 88.625237] Code: 66 90 89 d0 49 89 c8 8b 56 24 48 8b 4e 28 3b 46 04 74 08 f3 90 b8 0
2016-11-29T11:22:20.320718+01:00 plato kernel: [ 88.627824] RIP [<ffffffff8151e480>] dentry_rcuwalk_barrier.part.10+0x0/0x2
2016-11-29T11:22:20.320719+01:00 plato kernel: [ 88.629071] RSP <ffff8800b7013b10>
2016-11-29T11:22:20.320720+01:00 plato kernel: [ 88.630344] ---[ end trace 1fa9279381ad17fa ]---

Automatic Reply
05-Dec-2016, 06:30
horiba,

It appears that in the past few days you have not received a response to your
posting. That concerns us, and has triggered this automated reply.

These forums are peer-to-peer, best effort, volunteer run and that if your issue
is urgent or not getting a response, you might try one of the following options:

- Visit http://www.suse.com/support and search the knowledgebase and/or check all
the other support options available.
- Open a service request: https://www.suse.com/support
- You could also try posting your message again. Make sure it is posted in the
correct newsgroup. (http://forums.suse.com)

Be sure to read the forum FAQ about what to expect in the way of responses:
http://forums.suse.com/faq.php

If this is a reply to a duplicate posting or otherwise posted in error, please
ignore and accept our apologies and rest assured we will issue a stern reprimand
to our posting bot..

Good luck!

Your SUSE Forums Team
http://forums.suse.com

ab
05-Dec-2016, 13:37
I've asked contacts within SUSE to see what they think about this; I'll
let you know if they confirm either way.

In the meantime, could you share some details on your version/patch of
SLES, what youa re doing with NFS in particular (provide the /etc/exports
or /etc/fstab file contents perhaps), etc.?

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...

ab
06-Dec-2016, 13:37
Could you open a Service Request (SR) and cite Bug# 984194?

Are you able to get a core dump for this?

Apparently you are not the first to see a core that may be related, and
the bug above MAY have the same root cause, but it is still under
investigation. Since it is a kernel bug, it is probably best to work
directly with SUSE on getting a fix for you.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...

ab
06-Dec-2016, 13:41
Ignore me; unless I'm very mistaken, this is your bug report already. I
was thrown off since it is not very new.

If you have some details of your environment that you can share to try to
reproduce this, I have a SLES 12 SP1 box with the same kernel version
where I could do some NFS testing.


--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...