PDA

View Full Version : SLES 11 SP4 FC HBA not working anymore after upgrade to SLES 11 SP4



berndgsflinux
27-Oct-2015, 16:22
Hi,

i have a HP ProLiant ML 350 G6. Using SLES 11 SP3 everything was running fine. I have a FC HBA (Emulex LPe11000 PCIe) which connected the server to a FC SAN, this worked without any problems. After upgrading to SLES 11 SP4 i get constantly these errors in /var/log/messages:

Oct 27 15:42:35 sunhb58820 kernel: [ 2957.988711] lpfc 0000:0e:00.1: 1:1305 Link Down Event x20a received Data: x20a x20 x80011 x0 x0
Oct 27 15:42:37 sunhb58820 kernel: [ 2959.780863] lpfc 0000:0e:00.1: 1:1303 Link Up Event x20b received Data: x20b x0 x10 x0 x0 x0 0
Oct 27 15:42:40 sunhb58820 kernel: [ 2963.079327] lpfc 0000:0e:00.0: 0:1305 Link Down Event x20a received Data: x20a x20 x80011 x0 x0
Oct 27 15:42:42 sunhb58820 kernel: [ 2964.866422] lpfc 0000:0e:00.0: 0:1303 Link Up Event x20b received Data: x20b x0 x10 x0 x0 x0 0
Oct 27 15:42:46 sunhb58820 kernel: [ 2968.985676] lpfc 0000:0e:00.1: 1:1305 Link Down Event x20c received Data: x20c x20 x80011 x0 x0

I don't have a connection to the SAN anymore. Also installing the newest kernel from SuSE did not solve it.
I can reproduce the error with a new clean installation of SP3 and updating to sp4.
Also exchange of the FC HBA to a HP AJ763 did not solve the problem.
I booted the server with a knoppix dvd, then the connection to SAN was ok.

Thanks for any help.


Bernd

jmozdzen
27-Oct-2015, 16:38
Hi Bernd,

sounds like a clean case to open a service request... can you do so?

Regards,
Jens

berndgsflinux
27-Oct-2015, 17:09
Hi Jens,

thanks for the quick reply. Unfortunately we don't have a support contract with SuSE. I tried to open a bug in bugzilla (https://bugzilla.suse.com/index.cgi), but it seems that i'm to stupid for it, i didn't suceed (I didn't find SLES 11, just SLES 11 RT). Finally i entered the bug here: https://www.suse.com/support/report-a-bug/ Is this the same as bugzilla ?

Bernd

jmozdzen
27-Oct-2015, 17:34
Hi Bernd,


Hi Jens,

thanks for the quick reply. Unfortunately we don't have a support contract with SuSE.

I had already asked by SUSE contacts if more information on this is available. Interestingly, my personal search didn't reveal any "storm of reports" that I'd expect if this would be a more general problem.


I tried to open a bug in bugzilla (https://bugzilla.suse.com/index.cgi), but it seems that i'm to stupid for it, i didn't suceed (I didn't find SLES 11, just SLES 11 RT).

No, you're not. There's no general access to such bugs and even less to opening bugs for the professional products.


Finally i entered the bug here: https://www.suse.com/support/report-a-bug/ Is this the same as bugzilla ?

I have had success that way, but it is a "write-only" way of reporting bugs. Even SRs do not necessarily result in Bugzilla entries (the latter are for the developer team to track issues), and even if your web form report is handled internally, you cannot reliably tell. After all, that's what the support contracts are for.

I have not heard back yet from my internal contact, but until then, some more insight may be helpful:

- what version of the driver are you using in SLES11SP4 ("modinfo lpfc")?
- what version of the driver were you using in SLES11SP3 ("modinfo lpfc")?
- what driver messages do you see during boot (you seem to only have quoted the actual error messages, I'm after those sent during adapter initialization, too)?
- do you have any "special" adapter configuration (whatever "special" may be ;) )

Is that Emulex LPe11000 PCIe a more recent adapter model?

Using that info, maybe we can find some way to get it work, even without the SUSE team ;)

Regards,
Jens

berndgsflinux
27-Oct-2015, 19:16
Hi Bernd,


I had already asked by SUSE contacts if more information on this is available. Interestingly, my personal search didn't reveal any "storm of reports" that I'd expect if this would be a more general problem.

I also expected that.



No, you're not. There's no general access to such bugs and even less to opening bugs for the professional products.

Hmm.

I have had success that way, but it is a "write-only" way of reporting bugs. Even SRs do not necessarily result in Bugzilla entries (the latter are for the developer team to track issues), and even if your web form report is handled internally, you cannot reliably tell. After all, that's what the support contracts are for.

I have not heard back yet from my internal contact, but until then, some more insight may be helpful:

- what version of the driver are you using in SLES11SP4 ("modinfo lpfc")?
sunhb58820:~ # modinfo lpfc
filename: /lib/modules/3.0.101-65-default/kernel/drivers/scsi/lpfc/lpfc.ko
version: 0:10.4.8000.0.
author: Emulex Corporation - tech.support@emulex.com
description: Emulex LightPulse Fibre Channel SCSI driver 10.4.8000.0.
license: GPL
srcversion: 5B16B2A835A7BE8F801D80B
...
strange is that the release notes from SP4 say: "Updated lpfc driver to version 8.3.5.7". (https://www.suse.com/releasenotes/x86_64/SUSE-SLES/11-SP4/#Drivers.Storage ) ???

- what version of the driver were you using in SLES11SP3 ("modinfo lpfc")?
pc63164:~ # modinfo lpfc
filename: /lib/modules/3.0.101-0.35-default/kernel/drivers/scsi/lpfc/lpfc.ko
version: 0:8.3.7.10.7p
author: Emulex Corporation - tech.support@emulex.com
description: Emulex LightPulse Fibre Channel SCSI driver 8.3.7.10.7p
license: GPL
srcversion: 558BEA7EE9EE5EB52574D78
...

- what driver messages do you see during boot (you seem to only have quoted the actual error messages, I'm after those sent during adapter initialization, too)?
/var/log/boot.msg:
...
<4>[ 1.734676] Emulex LightPulse Fibre Channel SCSI driver 10.4.8000.0.
<4>[ 1.734773] Copyright(c) 2004-2014 Emulex. All rights reserved.
<6>[ 1.734987] lpfc 0000:0e:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30
<7>[ 1.735083] lpfc 0000:0e:00.0: setting latency timer to 64
<6>[ 1.736047] scsi0 : Emulex LPe11000 PCIe Fibre Channel Adapter on PCI bus 0e device 00 irq 30
<6>[ 1.868988] Refined TSC clocksource calibration: 1999.431 MHz.
<6>[ 1.869083] Switching to clocksource tsc
<7>[ 2.204968] lpfc 0000:0e:00.0: irq 65 for MSI/MSI-X
<6>[ 3.368864] lpfc 0000:0e:00.1: PCI INT B -> GSI 37 (level, low) -> IRQ 37
<7>[ 3.368961] lpfc 0000:0e:00.1: setting latency timer to 64
<6>[ 3.370203] scsi1 : Emulex LPe11000 PCIe Fibre Channel Adapter on PCI bus 0e device 01 irq 37
<7>[ 3.836486] lpfc 0000:0e:00.1: irq 66 for MSI/MSI-X
<3>[ 4.617417] lpfc 0000:0e:00.0: 0:1303 Link Up Event x1 received Data: x1 xf7 x10 x0 x0 x0 0
...
<3>[ 6.093965] lpfc 0000:0e:00.1: 1:1303 Link Up Event x1 received Data: x1 xf7 x10 x0 x0 x0 0
...

- do you have any "special" adapter configuration (whatever "special" may be ;) )
no, i don't believe. The HBA is connected to a FC SAN (HP) using two fibre optics.


Is that Emulex LPe11000 PCIe a more recent adapter model?
It's a bit old. But i have the same trouble with another FC HBA (HP AJ763)

Using that info, maybe we can find some way to get it work, even without the SUSE team ;)

Regards,
Jens
Thanks for your help.

Bernd

jmozdzen
28-Oct-2015, 13:15
Hi Bernd,

I've had a look at the Emulex driver site and noticed that the LPe11000 is *not* listed in the compatibility list of the v10.6.144.21 driver available there (http://www.emulex.com/downloads/emulex/drivers/linux/sles-11/drivers/). It may well be that the SUSE-supplied driver is too new for the card, you might want to ask Emulex for a definite answer (I've only dealt with their cards once, so my interpretation of the compatibility list may be wrong).

I've already extended my question toward SUSE by asking if they can provide a legacy package for the old driver, in case v10 doesn't support your card, although SP3 did.

Regards,
Jens

berndgsflinux
28-Oct-2015, 13:45
Hi Jens,

but we are talking about driver version version: 0:10.4.8000.0., which is included in SP4, and not 10.6.144.21. But i will have a look there. The other FC HBA (HP AJ763B, which is an Emulex) i tried also does not work with SP4, but with SP3.

Bernd

jmozdzen
28-Oct-2015, 15:06
Hi Bernd,

The other FC HBA (HP AJ763B, which is an Emulex) i tried also does not work with SP4, but with SP3.

if I see it right, the HP AJ763B is the re-branded version of the Emulex LPe12002 - and the LPe12000 series is mentioned as supported by at least the v10 drivers as distributed by Emulex. Have you tried running the driver provided by Emulex for that machine/adapter (http://www-dl.emulex.com/support/elx/rt10.6.1/10.6.144.24/Linux/RPM_pkg/elx-lpfc-dd-sles11sp-10.6.144.21-1.tar.gz)?

Regards,
Jens

berndgsflinux
29-Oct-2015, 14:23
Hi,

it's running now. I'm using this module:
filename: /lib/modules/3.0.101-65-default/weak-updates/updates/lpfc.ko
version: 0:10.6.144.21
author: Emulex Corporation - tech.support@emulex.com
description: Emulex LightPulse Fibre Channel SCSI driver 10.6.144.21
license: GPL

But the module just runs with the 12000 series, not with the 11000 series. Either SuSE provides a module for the older one or we have to buy a new HBA.
What is now if i install a kernel patch/update ? Will the module still run afterwards ?

Jens, thanks for your help. If you hear something from SuSE i would be glad if you forward it to me.

Bernd

jmozdzen
29-Oct-2015, 14:37
Hi Bernd,

> it's running now
> But the module just runs with the 12000 series, not with the 11000 series. Either SuSE provides a module for the older one or we have to buy a new HBA.

just to get a clear picture - it runs with an LPe12k card that you inserted for a test? Or does it run with your LPe11k (so that shipping 10.6.144.21 would fix the situation), which would contradict the Emulex release notes? I need to know so that I can forward this to SUSE.

> If you hear something from SuSE i would be glad if you forward it to me.

SUSE is looking into the report - SP4 is not supposed to break things that worked with SP3.

Regards,
Jens

berndgsflinux
29-Oct-2015, 14:54
Hi,

the recent module (from the Emulex webpage) version 0:10.6.144.21 runs with the 12000 serie, not with the 11000 serie. So using the new module from Emulex did not fix
the problem with the 11000 card. OK ?

Bernd

jmozdzen
29-Oct-2015, 15:31
Hi Bernd,


Hi,

the recent module (from the Emulex webpage) version 0:10.6.144.21 runs with the 12000 serie, not with the 11000 serie. So using the new module from Emulex did not fix
the problem with the 11000 card.

thanks for confirming... I got mislead by the initial "it's running now".

Regards,
Jens

jmozdzen
29-Oct-2015, 16:34
Hi Bernd,

my bad - I overlooked that you indeed do have a LPe12k card - the rebranded HP... it's been a rough day ;) So it means that the SUSE-shipped driver works for neither card, while the Emulex-distributed driver at least works successfully for the HP/LPe12k card. I've added that info to my report.

Regards,
Jens

berndgsflinux
29-Oct-2015, 17:08
Hi,

yes. What is about a kernel patch/update ? The README from the Emulex Driver says:

"4. Errata Kernel Upgrade Support

If a user has installed this lpfc package on a supported Generally
Available distribution kernel release, any errata kernel upgrades of that
distribution release will be handled automatically."

Does that mean that i don't have to do anything when i make a kernel patch/update ?

Bernd

jmozdzen
29-Oct-2015, 17:18
Hi Bernd,

> Does that mean that i don't have to do anything when i make a kernel patch/update ?

that's how I'd read it, too. But OTOH, once a future SLES-supplied version solves your issues, having that one active once it is available as a kernel patch would actually be desirable.

Regards,
Jens

berndgsflinux
17-Nov-2015, 14:21
Hi Bernd,



that's how I'd read it, too. But OTOH, once a future SLES-supplied version solves your issues, having that one active once it is available as a kernel patch would actually be desirable.

Regards,
Jens

Hi Jens,

i tried some of the elder drivers from Emulex fpr my LPe11000. But they didn't work because the kernel version did not match. Now my idea is to install it from sources. When i install it via sources, the binary should match the running kernel. Or ? I downloaded the source code and extracted it:

total 2828
drwxr-xr-x 2 root root 4096 Oct 31 2014 ./
drwxrwxrwt 3 root root 4096 Nov 17 14:11 ../
-rw-r--r-- 1 root root 18011 Mar 9 2004 COPYING
-rw-r--r-- 1 root root 63292 Oct 23 2013 ChangeLog
-rw-r--r-- 1 root root 2313 Jan 31 2011 GNUmakefile
-rw-r--r-- 1 root root 5739 Sep 15 2014 Makefile
-rw-r--r-- 1 root root 36529 Sep 15 2014 lpfc.h
-rw-r--r-- 1 root root 180494 Jul 22 2014 lpfc_attr.c
-rw-r--r-- 1 root root 168748 Jul 22 2014 lpfc_bsg.c
-rw-r--r-- 1 root root 10123 Apr 4 2014 lpfc_bsg.h
-rw-r--r-- 1 root root 3111 Oct 26 2011 lpfc_compat.h
-rw-r--r-- 1 root root 24805 Oct 3 2014 lpfc_crtn.h
-rw-r--r-- 1 root root 54208 Jul 22 2014 lpfc_ct.c
-rw-r--r-- 1 root root 137930 Jul 22 2014 lpfc_debugfs.c
-rw-r--r-- 1 root root 19048 Jul 24 2012 lpfc_debugfs.h
-rw-r--r-- 1 root root 13413 Aug 1 2014 lpfc_disc.h
-rw-r--r-- 1 root root 268112 Aug 1 2014 lpfc_els.c
-rw-r--r-- 1 root root 190401 Aug 1 2014 lpfc_hbadisc.c
-rw-r--r-- 1 root root 113134 Apr 4 2014 lpfc_hw.h
-rw-r--r-- 1 root root 126284 Apr 4 2014 lpfc_hw4.h
-rw-r--r-- 1 root root 343521 Oct 28 2014 lpfc_init.c
-rw-r--r-- 1 root root 3042 Feb 1 2013 lpfc_logmsg.h
-rw-r--r-- 1 root root 79106 Jun 25 2013 lpfc_mbox.c
-rw-r--r-- 1 root root 17147 Apr 4 2014 lpfc_mem.c
-rw-r--r-- 1 root root 5524 Oct 14 2011 lpfc_nl.h
-rw-r--r-- 1 root root 77886 Jul 14 2014 lpfc_nportdisc.c
-rw-r--r-- 1 root root 188851 Oct 3 2014 lpfc_scsi.c
-rw-r--r-- 1 root root 6392 Apr 4 2014 lpfc_scsi.h
-rw-r--r-- 1 root root 533704 Oct 28 2014 lpfc_sli.c
-rw-r--r-- 1 root root 12822 Jul 31 2014 lpfc_sli.h
-rw-r--r-- 1 root root 23508 Oct 28 2014 lpfc_sli4.h
-rw-r--r-- 1 root root 1740 Oct 31 2014 lpfc_version.h
-rw-r--r-- 1 root root 26220 Aug 7 2013 lpfc_vport.c
-rw-r--r-- 1 root root 3892 Jan 31 2013 lpfc_vport.h


I'm a bit upset. I installed already software via the source code, but i always had a configure script in the package. Here it is missing.
How can i install the driver ?


Bernd

jmozdzen
17-Nov-2015, 15:59
Hi Bernd,

if everything else fails (and since iirc the driver is part of the regular kernel), fetch the kernel source RPM for your kernel and replace the contents of /usr/src/linux/drivers/scsi/lpfc with the older driver files.

It might be worth a try to fetch the sources of the latest SP3 kernel and take the driver source code from there, to replace the sources of the lpfc driver included in the SP4 sources.

Regards,
Jens

berndgsflinux
18-Nov-2015, 15:06
Hi Jens,

i tried to install driver 10.2.469.0. Installation suceeded, but failed to run. Again permanent error messages in /var/log/messages:

...
Nov 18 14:43:23 pc62878 kernel: [155476.826200] lpfc 0000:06:00.1: 1:1303 Link Up Event x1 received Data: x1 xf7 x10 x0 x0 x0 0
Nov 18 14:43:31 pc62878 kernel: [155484.888439] lpfc 0000:06:00.0: 0:1305 Link Down Event x2 received Data: x2 x20 x80011 x0 x0
Nov 18 14:43:32 pc62878 kernel: [155485.964203] lpfc 0000:06:00.1: 1:1305 Link Down Event x2 received Data: x2 x20 x80011 x0 x0
Nov 18 14:43:32 pc62878 kernel: [155486.563134] lpfc 0000:06:00.0: 0:1303 Link Up Event x3 received Data: x3 x0 x10 x0 x0 x0 0
Nov 18 14:43:33 pc62878 kernel: [155487.725822] lpfc 0000:06:00.1: 1:1303 Link Up Event x3 received Data: x3 x0 x10 x0 x0 x0 0
Nov 18 14:43:42 pc62878 kernel: [155495.875955] lpfc 0000:06:00.0: 0:1305 Link Down Event x4 received Data: x4 x20 x80011 x0 x0
Nov 18 14:43:43 pc62878 kernel: [155496.951729] lpfc 0000:06:00.1: 1:1305 Link Down Event x4 received Data: x4 x20 x80011 x0 x0
Nov 18 14:43:44 pc62878 kernel: [155498.391717] lpfc 0000:06:00.0: 0:1303 Link Up Event x5 received Data: x5 x0 x10 x0 x0 x0 0
Nov 18 14:43:44 pc62878 kernel: [155498.574492] lpfc 0000:06:00.1: 1:1303 Link Up Event x5 received Data: x5 x0 x10 x0 x0 x0 0
Nov 18 14:43:54 pc62878 kernel: [155507.862346] lpfc 0000:06:00.0: 0:1305 Link Down Event x6 received Data: x6 x20 x80011 x0 x0
Nov 18 14:43:54 pc62878 kernel: [155507.939253] lpfc 0000:06:00.1: 1:1305 Link Down Event x6 received Data: x6 x20 x80011 x0 x0
Nov 18 14:43:55 pc62878 kernel: [155509.615968] lpfc 0000:06:00.0: 0:1303 Link Up Event x7 received Data: x7 x0 x10 x0 x0 x0 0
Nov 18 14:43:56 pc62878 kernel: [155510.431028] lpfc 0000:06:00.1: 1:1303 Link Up Event x7 received Data: x7 x0 x10 x0 x0 x0 0
...

I didn't need a configure script, just "make" and "make install" sufficed.

I don't understand your last posting. I will try to take the sources from SP3 and install the driver from scratch for SP4.
But was is about a kernel patch/update ? Do i have to compile again ?

Bernd

jmozdzen
18-Nov-2015, 15:29
Hi Bernd,

I don't understand your last posting. I will try to take the sources from SP3 and install the driver from scratch for SP4.
But was is about a kernel patch/update ? Do i have to compile again ?

I was thinking along the following lines:

- the SLES11SP3 driver version works
- the source code is part of the SLES kernel source package (rather than a separate package)
- you're running SLES11SP4

So a procedure worth a try could be
- "Install" the kernel source for your current SLES11SP4 kernel
- somewhere else, extract the source for the SLES11SP3 kernel with the latest working driver
- copy over the driver source from the SP3 extract to the corresponding SP4 source code directory (after moving the SP4 driver sources out of the way, of course)
- recompile the modules
- test the lpfc module (SP3 version, compiled for the SP4 kernel)

Regards,
Jens

berndgsflinux
18-Nov-2015, 15:56
Hi Jens,

i did it that way. Unfortunately it didn't suceed. The last i will do is to have a look if HP provides a driver, otherwise i will buy a new HBA.


Bernd

jmozdzen
18-Nov-2015, 16:59
Hi Bernd,

> otherwise i will buy a new HBA

while there is money involved (and I despise shredding hardware just because some new software won't work no longer), this actually may be the most cost-effective solution.

Sorry I couldn't be of more help.

Regards,
Jens

PS: What makes me wonder is why you see the same errors with the old driver: Might it be that the original SP4 driver is still used? (nah, here it goes again... I just cannot let go ;) )

berndgsflinux
18-Nov-2015, 17:15
Hi Jens,

i gave it one more try. On the HP website i found a driver which seemed to match.
http://h20566.www2.hpe.com/hpsc/swd/public/readIndex?sp4ts.oid=5246578&swLangOid=8&swEnvOid=4049

But i still didn't suceed, although this driver made me very hopeful. I get the impression that the HBA is maybe broken.
I'm pretty sure that the old SP4 driver was not involved any longer. Before i tested a new driver i always unload the previous module, and i always renamed the directories before inserting new files/directories. I will put the HBA in another server. Maybe ...

Bernd

berndgsflinux
18-Nov-2015, 18:54
Hi,

also the test in another server did not suceed. Although i'm pretty sure the driver should match. The HBA is labeled as "A8003A", which is, following this guide (http://www8.hp.com/h20195/v2/getpdf.aspx/c04164498.pdf?ver=11) the model HP FC2242SR DualChannel 4Gb PCIe HBA.
And following this guide
(http://h20566.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=5246578&swItemId=MTX_3ce7d44a327c4c7597e40c32fb&swEnvOid=4049#tab4) this driver
(10.5.158.0) supports that HBA. But it didn't suceed.

Bernd

berndgsflinux
23-Nov-2015, 16:38
Hi,

i finally managed it. The HBA seemed to be broken. I got another from HP (we still had warranty for 1 month !), and this one is working. I used the driver from the HP Website.


Bernd

jmozdzen
23-Nov-2015, 18:34
Hi Bernd,

are you willing to give the SP4 driver a try as well? I'd like to verify my "call for help" towards SUSE is still valid and wasn't caused solely by a bad card...

Regards,
Jens

berndgsflinux
23-Nov-2015, 18:46
Hi Jens,

no problem. I will try it and give you a feedback.


Bernd

berndgsflinux
24-Nov-2015, 11:45
Hi Jens,

i'm very surprised. The driver from SP4 is running fine. But i'm sure that previously with SP3 the HBA was running fine and immediately after installation of SP4 it stopped running and threw errors to syslog. So the card must has been broken concurrently with the installation of SP4. Wow. What a coincidence.
Sorry, you spent a lot of time helping me. But i didn't believe that such a coincident is possible.


Bernd

jmozdzen
24-Nov-2015, 12:45
Hi Bernd,

from the message history, it looks as if the SP3 driver was more tolerant to this specific error:

> I can reproduce the error with a new clean installation of SP3 and updating to sp4.

So it seems that you even tested going back to SP3, which worked, and then upgrading again, which failed.

But no matter what, it's good to know that the SP4 driver works - which also explains why SUSE didn't see more reports about this type of failure :)

I'll update my SUSE contact on this. Case closed :)

Best regards,
Jens