PDA

View Full Version : SLES 11 SP4 SLES 11 SP4 with samba 4.4.3 / kernel bug triggered by samba



kbraun
10-May-2016, 09:02
Dear SUSE,

on my server with SLES 11 SP4 with kernel 3.0.101-71-default I tried to compile and use
the latest samba software with version 4.4.3. Compiling was not a problem, but using
the software leads to an unstable system, resulting in kernel messages like those:

May 9 15:27:18 geo kernel: [81924.433944] BUG: soft lockup - CPU#0 stuck for 23s! [smbd-notifyd:6073]
May 9 15:27:18 geo kernel: [81924.433948] Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit binfmt_misc edd mpt3sas mpt2sas raid_class mptctl mptbase cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables fuse xfs loop dm_mod sr_mod cdrom ipv6 ipv6_lib igb dca ipmi_si ses iTCO_wdt i2c_i801 mei ptp usb_storage ipmi_msghandler joydev pcspkr enclosure iTCO_vendor_support sg pps_core rtc_cmos button container shpchp pci_hotplug ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea isci(X) libsas processor ehci_hcd scsi_transport_sas thermal_sys sd_mod crc_t10dif hwmon usbcore usb_common scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_d
May 9 15:27:18 geo kernel: h ahci libahci libata megaraid_sas scsi_mod
May 9 15:27:18 geo kernel: [81924.434048] Supported: Yes, External
May 9 15:27:18 geo kernel: [81924.434050] CPU 0
May 9 15:27:18 geo kernel: [81924.434052] Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit binfmt_misc edd mpt3sas mpt2sas raid_class mptctl mptbase cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables fuse xfs loop dm_mod sr_mod cdrom ipv6 ipv6_lib igb dca ipmi_si ses iTCO_wdt i2c_i801 mei ptp usb_storage ipmi_msghandler joydev pcspkr enclosure iTCO_vendor_support sg pps_core rtc_cmos button container shpchp pci_hotplug ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea isci(X) libsas processor ehci_hcd scsi_transport_sas thermal_sys sd_mod crc_t10dif hwmon usbcore usb_common scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_d
May 9 15:27:18 geo kernel: h ahci libahci libata megaraid_sas scsi_mod
May 9 15:27:18 geo kernel: [81924.434131] Supported: Yes, External
May 9 15:27:18 geo kernel: [81924.434134]
May 9 15:27:18 geo kernel: [81924.434137] Pid: 6073, comm: smbd-notifyd Tainted: G X 3.0.101-71-default #1 Supermicro X9DR3-F/X9DR3-F

I saw some discussions concerning this problem where most people say this is a kernel
bug which can be triggered by samba, and Volker Lendecke from SerNet recommends downgrading
Samba to 4.2 and waiting for the next kernel where the issue is fixed, see:

http://samba.2283325.n4.nabble.com/AW-Centos-6-kernel-soft-lockup-CPU-20-stuck-for-67s-smbd-notifyd-after-upgrade-form-4-2-to-4-4-td4702013.html
http://comments.gmane.org/gmane.linux.suse.opensuse.evergreen/153
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980

My questions is, if kernel 3.0.101-71-default in SLES 11 SP4 also suffers from this bug
and if so, when it will be fixed.

Thanks for any answers,

Klaus

smflood
10-May-2016, 11:09
On 10/05/16 09:04, kbraun wrote:

> on my server with SLES 11 SP4 with kernel 3.0.101-71-default I tried to
> compile and use
> the latest samba software with version 4.4.3. Compiling was not a
> problem, but using
> the software leads to an unstable system, resulting in kernel messages
> like those:
>
> May 9 15:27:18 geo kernel: [81924.433944] BUG: soft lockup - CPU#0
> stuck for 23s! [smbd-notifyd:6073]
> May 9 15:27:18 geo kernel: [81924.433948] Modules linked in: ip6t_LOG
> xt_tcpudp xt_pkttype ipt_LOG xt_limit binfmt_misc edd mpt3sas mpt2sas
> raid_class mptctl mptbase cpufreq_conservative cpufreq_userspace
> cpufreq_powersave acpi_cpufreq mperf microcode ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT
> iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns
> nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
> xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables fuse xfs
> loop dm_mod sr_mod cdrom ipv6 ipv6_lib igb dca ipmi_si ses iTCO_wdt
> i2c_i801 mei ptp usb_storage ipmi_msghandler joydev pcspkr enclosure
> iTCO_vendor_support sg pps_core rtc_cmos button container shpchp
> pci_hotplug ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm
> i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea isci(X) libsas
> processor ehci_hcd scsi_transport_sas thermal_sys sd_mod crc_t10dif
> hwmon usbcore usb_common scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc
> scsi_dh_alua scsi_d
> May 9 15:27:18 geo kernel: h ahci libahci libata megaraid_sas scsi_mod
> May 9 15:27:18 geo kernel: [81924.434048] Supported: Yes, External
> May 9 15:27:18 geo kernel: [81924.434050] CPU 0
> May 9 15:27:18 geo kernel: [81924.434052] Modules linked in: ip6t_LOG
> xt_tcpudp xt_pkttype ipt_LOG xt_limit binfmt_misc edd mpt3sas mpt2sas
> raid_class mptctl mptbase cpufreq_conservative cpufreq_userspace
> cpufreq_powersave acpi_cpufreq mperf microcode ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT
> iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns
> nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
> xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables fuse xfs
> loop dm_mod sr_mod cdrom ipv6 ipv6_lib igb dca ipmi_si ses iTCO_wdt
> i2c_i801 mei ptp usb_storage ipmi_msghandler joydev pcspkr enclosure
> iTCO_vendor_support sg pps_core rtc_cmos button container shpchp
> pci_hotplug ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm
> i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea isci(X) libsas
> processor ehci_hcd scsi_transport_sas thermal_sys sd_mod crc_t10dif
> hwmon usbcore usb_common scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc
> scsi_dh_alua scsi_d
> May 9 15:27:18 geo kernel: h ahci libahci libata megaraid_sas scsi_mod
> May 9 15:27:18 geo kernel: [81924.434131] Supported: Yes, External
> May 9 15:27:18 geo kernel: [81924.434134]
> May 9 15:27:18 geo kernel: [81924.434137] Pid: 6073, comm: smbd-notifyd
> Tainted: G X 3.0.101-71-default #1 Supermicro
> X9DR3-F/X9DR3-F
>
> I saw some discussions concerning this problem where most people say
> this is a kernel
> bug which can be triggered by samba, and Volker Lendecke from SerNet
> recommends downgrading
> Samba to 4.2 and waiting for the next kernel where the issue is fixed,
> see:
>
> http://samba.2283325.n4.nabble.com/AW-Centos-6-kernel-soft-lockup-CPU-20-stuck-for-67s-smbd-notifyd-after-upgrade-form-4-2-to-4-4-td4702013.html
> http://comments.gmane.org/gmane.linux.suse.opensuse.evergreen/153
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980
>
> My questions is, if kernel 3.0.101-71-default in SLES 11 SP4 also
> suffers from this bug
> and if so, when it will be fixed.
>
> Thanks for any answers,

Can I ask why you want Samba 4.4.3 on SLES11 SP4? Is it for a particular
security fix or feature?

The latest supported version of Samba available from SUSE for SLES11 SP4
is 3.6.3. Given that SLES is SUSE Linux _Enterprise_ Server installing
your own compiled version of Samba would be unsupported by SUSE.

If you want Samba 4.x then I suggest you look at SLES12 SP1 as the
latest supported version of Samba there is 4.2.4.

HTH.
--
Simon
SUSE Knowledge Partner

------------------------------------------------------------------------
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below. Thanks.
------------------------------------------------------------------------

kbraun
10-May-2016, 13:18
Simon,

the reason is that after the last samba update to version 3.6.3-76.1-3640-SUSE-CODE11-x86_64
(badlock patches) I got some problems with the functionality of samba.

One of the problems is that wbinfo -c is not working any more and that the automatic al password
change on the Microsoft AD which is triggered by samba usually every 7 days is broken. In the
samba log files this leads to messages like

[2016/04/15 19:16:44.095713, 0] rpc_client/cli_netlogon.c:693(rpccli_netlogon_set_trust_passw ord)
credentials chain check failed

once per minute.

I asked SLES about this problem, but unfortunately at our University we only have a low level
subscription where support is not included so that I can not open a call at SUSE. Nevertheless
I got friendly answer from Kirk Penrose <kpenrose@suse.com> who opened bug 976657 for me, but
until now I got no solution.

Further more I saw that there are three additional patches addressing regressions in the samba 3.6
backports introduced by the last security releases. As far as I can see one of them should solve
the password change problem I guess. The related information can be found in

http://samba.2283325.n4.nabble.com/Badlock-regression-fixes-tt4701810.html

and I also sent this information to Kirk, but still I have no answer.

So if you ask me why I want Samba 4.4.3 this is the reason. Indeed I would highly appreciate using
the on board samba version of SLES 11 SP4 if the fixes mentioned above can be included. Otherwise
it's a bit problematic to deal with a software with known limitations.

Concerning the usage of newer samba 4 releases which are compiled by my own it is obvious that
those releases will not be supported by SUSE. On the other hand, if it is possible to run software
which triggers a kernel bug causing the system to die slowly, the kernel bug should be fixed because
you can not expect that every software is written in a way to bypass this kernel problem.


Best Regards,

Klaus

smflood
22-Jun-2016, 15:44
On 10/05/16 13:24, kbraun wrote:

> the reason is that after the last samba update to version
> 3.6.3-76.1-3640-SUSE-CODE11-x86_64
> (badlock patches) I got some problems with the functionality of samba.
>
> One of the problems is that wbinfo -c is not working any more and that
> the automatic al password
> change on the Microsoft AD which is triggered by samba usually every 7
> days is broken. In the
> samba log files this leads to messages like
>
> [2016/04/15 19:16:44.095713, 0]
> rpc_client/cli_netlogon.c:693(rpccli_netlogon_set_trust_passw ord)
> credentials chain check failed
>
> once per minute.
>
> I asked SLES about this problem, but unfortunately at our University we
> only have a low level
> subscription where support is not included so that I can not open a call
> at SUSE. Nevertheless
> I got friendly answer from Kirk Penrose <kpenrose@suse.com> who opened
> bug 976657 for me, but
> until now I got no solution.
>
> Further more I saw that there are three additional patches addressing
> regressions in the samba 3.6
> backports introduced by the last security releases. As far as I can see
> one of them should solve
> the password change problem I guess. The related information can be
> found in
>
> http://samba.2283325.n4.nabble.com/Badlock-regression-fixes-tt4701810.html
>
> and I also sent this information to Kirk, but still I have no answer.
>
> So if you ask me why I want Samba 4.4.3 this is the reason. Indeed I
> would highly appreciate using
> the on board samba version of SLES 11 SP4 if the fixes mentioned above
> can be included. Otherwise
> it's a bit problematic to deal with a software with known limitations.
>
> Concerning the usage of newer samba 4 releases which are compiled by my
> own it is obvious that
> those releases will not be supported by SUSE. On the other hand, if it
> is possible to run software
> which triggers a kernel bug causing the system to die slowly, the kernel
> bug should be fixed because
> you can not expect that every software is written in a way to bypass
> this kernel problem.

Checking bug 976657 referenced above I see there is an unofficial (thus
unsupported) Samba version 3.6.3-101.1 available for testing with SLES11
SP4 at
http://download.opensuse.org/repositories/network:/samba:/MAINTAINED:/SLE_11/SLE_11/x86_64/

HTH.
--
Simon
SUSE Knowledge Partner

------------------------------------------------------------------------
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below. Thanks.
------------------------------------------------------------------------

swadm
24-Jun-2016, 14:40
for another issue with this newest patch level version of samba for SLES11 SP4 (3.6.3-76.1) support gave us access to a PTF (program temporary fix) version:

Version 3.6.3-76.1.10865.1.PTF.978898-3660-SUSE-CODE11-x86_64

Maybe you could request this version, and see if it adresses your issue, too.

HTH, Tom

smflood
24-Jun-2016, 14:52
On 24/06/16 14:44, swadm wrote:

> for another issue with this newest patch level version of samba for
> SLES11 SP4 (3.6.3-76.1) support gave us access to a PTF (program
> temporary fix) version:
>
> Version 3.6.3-76.1.10865.1.PTF.978898-3660-SUSE-CODE11-x86_64

Could you possibly send me your SR and/or bug (if you know it) number
via private message?

> Maybe you could request this version, and see if it adresses your issue,
> too.

Except Klaus isn't able to create a support ticket ...

Thanks.
--
Simon
SUSE Knowledge Partner

------------------------------------------------------------------------
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below. Thanks.
------------------------------------------------------------------------

kbraun
28-Jun-2016, 13:37
Thank you for your suggestions. Because the file server I am talking about
is in productive use I will not be able to do tests if a special samba version
will work like expected or not.

To be honest, stability was one of the main reasons why we decided to use
Suse Linux Enterprise Server in our department and I was really upset when
I made the experience that a samba patch from SLES caused annoying
problems - and still there is no "official" solution.

For now I am using samba 4.2. which seems to run stable. Maybe I will
update the server to SLES 12 in the future to solve the problem definitely,
but this is an other topic.

For those who are interested in the question I had at the beginning of the threat
if a specific kernel bug will be fixed thus allowing the use of more up to date
samba releases like the 4.4 series the kernel update kernel-default-3.0.101-77.1
seems to fix the problem:

The following non-security bugs were fixed:
...
- af_unix: Guard against other == sk in unix_dgram_sendmsg (bsc#973570).
...


Hopefully my samba problems are sorted out for now.

Thank's again

Klaus

smflood
29-Jun-2016, 12:46
On 28/06/16 13:44, kbraun wrote:

> To be honest, stability was one of the main reasons why we decided to
> use
> Suse Linux Enterprise Server in our department and I was really upset
> when
> I made the experience that a samba patch from SLES caused annoying
> problems - and still there is no "official" solution.
>
> For now I am using samba 4.2. which seems to run stable. Maybe I will
> update the server to SLES 12 in the future to solve the problem
> definitely,
> but this is an other topic.

So you're now using Samba 4.2.x on SLES11 SP4? To be supported I'd
definitely look at upgrading the server to SLES12 SP1 with official
Samba 4.2.x packages.

> For those who are interested in the question I had at the beginning of
> the threat
> if a specific kernel bug will be fixed thus allowing the use of more up
> to date
> samba releases like the 4.4 series the kernel update
> kernel-default-3.0.101-77.1
> seems to fix the problem:
>
> The following non-security bugs were fixed:
> ...
> - af_unix: Guard against other == sk in unix_dgram_sendmsg
> (bsc#973570).
> ...

Thanks for the report back.
--
Simon
SUSE Knowledge Partner

------------------------------------------------------------------------
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below. Thanks.
------------------------------------------------------------------------