PDA

View Full Version : Replacing failed hard drive in RAID5 array changed UUID ?



Cragdoo
24-Jun-2014, 12:19
I have an IBM x3650-M3, with a RAID 5 hardware array (5 x 600GB disks). Recently the server was rebooted and one of the drives failed on reboot. The drive was replaced replaced and the RAID array was rebuilt. However when I booted the server up I kept getting an error message :-


Want me to fall back to /dev/disk/by-id/scsi-.... (y/n):

A little more investigating and it was discovered that the by-id had changed from what was in the /boot/grub/menu.lst and /etc/fstab

I'll admit my linux skills are very basic, coming from a predominantly Windows background , but I have been working with servers with RAID arrays for years and this is the 1st time I've encountered a problem like this. Typically the RAID adapter only presents the logical disk to the OS, and abstracts the individual disks away from the OS. In this case it would appear not to be the case.

Has anyone come across a similar issue before , replacing a failed drive in a hardware RAID array, and SUSE generating a new UUID ??

If so, what was the work around.

Thanks in advance.

jmozdzen
24-Jun-2014, 17:38
Hi Cragdoo,


I have an IBM x3650-M3, with a RAID 5 hardware array (5 x 600GB disks). Recently the server was rebooted and one of the drives failed on reboot. The drive was replaced replaced and the RAID array was rebuilt. However when I booted the server up I kept getting an error message :-



A little more investigating and it was discovered that the by-id had changed from what was in the /boot/grub/menu.lst and /etc/fstab

I'll admit my linux skills are very basic, coming from a predominantly Windows background , but I have been working with servers with RAID arrays for years and this is the 1st time I've encountered a problem like this. Typically the RAID adapter only presents the logical disk to the OS, and abstracts the individual disks away from the OS. In this case it would appear not to be the case.

Has anyone come across a similar issue before , replacing a failed drive in a hardware RAID array, and SUSE generating a new UUID ??

If so, what was the work around.

Thanks in advance.

the ids in /dev/disk/by-id/ entries are generated by various characteristics - which id type were you using? If it was "dm-uuid-*" then no, that should not have changed. If it was scsi-*, then it fully depends on what the controller reports back, which is beyond the kernel's control.

And for the sake of further debugging - which level of SLES are you running on the server (and are current patches applied)?

Regards,
Jens

Cragdoo
24-Jun-2014, 18:03
thanks for the reply

It is by scsi-* , which is the preferred method dm-uuid-* or scsi-* ? Although I can only see by-id , by-path and by-uuid in /dev/disk

I'll get the SLES version to you when I can

jmozdzen
24-Jun-2014, 18:28
Hi Cragdoo,

thanks for the reply

It is by scsi-* , which is the preferred method dm-uuid-* or scsi-* ? Although I can only see by-id , by-path and by-uuid in /dev/disk

I'll get the SLES version to you when I can
IIRC subdirectories are created as required - i.e., if none of your file systems has a label, then no "/dev/disk/by-label" is created ;)

Which one to use... well, as always, it depends:

* for mounting file systems, I mostly use the "by-label" entries, but that turns tricky when you have to deal with external media (or other dynamically attached media), which may carry file systems with the same label as an existing one.

* dm-uuid is a good choice, unless you constantly change your file systems (I happened to reformat certain file systems quite often on a development server - by-uuid then is no good option :D ) and it's hard to memorize

* If you want to relate to exact hardware for some reason (see previous sentence), use some matching "by-id" entry (and it seems that some controllers change that info unexpectedly, like your RAID card).

* not every device leads to an entry in every section - no label, no by-label entry. Hardware devices don't get by-id/dm-uuid* entries and so on

So you pick what describes the wanted device best and changes the least ;)

Regards,
Jens

PS: Since you referenced scsi-*, that ID seems to have changed. That implies that the RAID controller changed the SCSI information for the RAID set after adding the new disk, strange indeed.

jmozdzen
24-Jun-2014, 18:30
ups, seems I mixed dm-uuid and disk/by-uuid in my response... my comment concerns the latter (disk/by-uuid may change when creating a new file system, I haven't used disk/by-id/dm-uuid yet so would have to test).

Sorry for the confusion.

Cragdoo
25-Jun-2014, 10:10
So you pick what describes the wanted device best and changes the least

I didn't set up the server, but I would have expected the system to be setup to use the logical drive ID from the hardware RAID configuration. Is this what the root=/dev/disk/by-id/scsi-3600605b002393460ff00005b058e24d8-part5 id is ?

jmozdzen
25-Jun-2014, 11:25
Hi Cragdoo,


I didn't set up the server, but I would have expected the system to be setup to use the logical drive ID from the hardware RAID configuration. Is this what the root=/dev/disk/by-id/scsi-3600605b002393460ff00005b058e24d8-part5 id is ?

that seems to be a reference to partition 5 of a scsi device identified by the value determined by "scsi_id" (from the man page: "queries a SCSI device via the SCSI INQUIRY vital product data (VPD) page 0x80 or 0x83 and uses the resulting data to generate a value that is unique across all SCSI devices that properly support page 0x80 or page 0x83.").

It is rather surprising that this value changed after a RAID recovery, but probably nothing you could do anything about (controller-specific behavior). If it bothers you and you have no dynamically attached devices, you might consider to change this to some "by-label" reference, after you gave your file systems appropriate labels.

Regards,
Jens