PDA

View Full Version : SLES 10 Filesystem raid errors



Todddhunter
25-May-2014, 14:30
I have an older SLES 10 server that has 2 hds in a software RAID1. One of the hds went bad, bad sectors, and I am now getting errors in the OS, the GNome desktop is crashing and apps wont load. So I can boot the server but I cannot do anything once logged in.

The file system is reiserfs. The 1st thing I did was use Clonezilla to make a copy of the good hd and am working from that so that if I make any mistakes they are not unrecoverable.

If I remove the bad hd the system will not boot to the desktop failing with an error. (Sorry for the size of the image, didn't know how to change it here)

http://www.smart-it-services.com/support/documents/linux.jpg

I have tried using the original install disk to do a rescue on the good hd but it tells me there are no Linux file systems on the disk and fails.

Sorry if this is not all detailed enough, I know my way around the basics but am by no means an expert.

Thanks,

Todd

Todddhunter
25-May-2014, 17:50
I have made some progress but am still having a problem.

Using 2 cloned disks I ran the following command from safe mode
cat /proc/mdstat

It showed /dev/md1 was missing /dev/sda2. I added it with
mdadm /dev/md1 -a /dev/sda2

I can now boot with the two cloned hds but gnome still crashes when I log on.

I am searching for info on repairing the filesystem but have not come up with an answer yet. How do I run a file system check on a Linux software Raid1 volume? Or do I run it on the individual hds?

Any help is appreciated.

KBOYLE
26-May-2014, 18:36
Todddhunter wrote:

> How do I run a file system check on a Linux
> software Raid1 volume? Or do I run it on the individual hds?

This site is the Linux-raid kernel list community-managed reference for
Linux software RAID as implemented in recent version 3 series and 2.6
kernels.
https://raid.wiki.kernel.org/index.php/Linux_Raid

This to be a good place to start.


Last year I had a similar issue and, like you, cloned my one remaining
good drive. Keep in mind that there are unique IDs on a drive that
sometimes can cause issues when using a cloned drive.

/etc/fstab shows what devices and filesystems you have mounted. If a
filesystem is installed on an md device, that is what you check.

--
Kevin Boyle - Knowledge Partner
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...

jmozdzen
26-May-2014, 19:01
Hi Toddhunter,

I have made some progress but am still having a problem.

Using 2 cloned disks I ran the following command from safe mode
cat /proc/mdstat

It showed /dev/md1 was missing /dev/sda2. I added it with
mdadm /dev/md1 -a /dev/sda2

I can now boot with the two cloned hds but gnome still crashes when I log on.

I am searching for info on repairing the filesystem but have not come up with an answer yet. How do I run a file system check on a Linux software Raid1 volume? Or do I run it on the individual hds?

Any help is appreciated.

recovering a RAID1 set after a single disk has failed is rather simple - you replace the disk and add the new disk to the RAID set. In your specific case it looks like you have multiple RAIDs, created from identical partitions on the two disks - nothing unusual either.

The standard procedure would have been

- replace the failed disk by a new one
- create the partitions as required - I assume those where identically-sized disks, so you could "clone" the partition table from the remaining disk, i.e. manually by looking at the partition sizes
- add the newly created partitions to the RAID sets via mdadm - this will trigger the RAID rebuild

In general, mdadm is your friend when handling your RIAD sets. "mdadm --detail <said device>" will show you, amongst other, the devices that make up your RAID set (or that devices are missing) and if the RAID set is consistent or rebuilding:

# mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Sun Apr 4 15:51:07 2010
Raid Level : raid1
Array Size : 96376 (94.13 MiB 98.69 MB)
Used Dev Size : 96376 (94.13 MiB 98.69 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Wed May 21 09:05:00 2014
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : linux:0
UUID : xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
Events : 52

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1

"mdadm /dev/md0 --add <newDevice>" will add a new device to a RAID set. If that device had never been part of the RAID set, then it's added as a hot spare... and in case of a degraded RAID set, it will immediately use this new hot spare to replace the failed device.

I'm not so fond of adding a cloned disk: There are markers on the RAID set volumes, if you simply clone them, you might get in trouble since the RAID software may have a hard time telling it's a newly added volume. It's better to have it "fresh" if it's to be treated as "fresh"...

> How do I run a file system check on a Linux software Raid1 volume? Or do I run it on the individual hds?

As a rule of thumb, you'll have to run the FS check on the same device from which you mount the FS. In your case, this is the *RAID device*. Never ever, no, no, no, work on the individual disk partitions... unless you known exactly what you're doing and are able to handle the wraith of file system hell all by yourself ;)

Regards,
Jens