Tuesday 19 January 2010

Recovering a RAID disk back into a RAID device

Okay, so you have been clever! You figured that with Linux you can build a RAID using nice cheap IDE disks. Linux's fantastic software RAID feature allows you to do this saving loads of money on harware
RAID and expensive SCSI disks. Maybe you did the easy thing and used a distribution like Mandrake Linux that makes it oh so easy to set up.

Then disaster happened! Maybe you did a forced reboot, maybe something else happened, but when the reboot had finished you did

dmesg | less

and you saw something like this in the log:
hdf7's event counter: 00000006
hde5's event counter: 00000003
md: superblock update time
inconsistency -- using the most recent one freshest: hdf7

md: kicking non-fresh hde5 from array!

Oh boy! Quick as a flash you look into the status of the array:
cat /proc/mdstat

and it looks bad:
# cat /proc/mdstat
Personalities : [raid0] [raid1]
read_ahead 1024 sectors

md2 : active raid1 hdf7[1]
39262720 blocks [2/1] [_U]
md1 : active raid0 hde2[0] hdf6[1]
497792 blocks 64k chunks
md0 : active raid1 hde1[0] hdf5[1]
505920 blocks [2/2] [UU]

Now, in the above, /dev/md2 is the root partition on your machine (of course this is only an example and it may NOT be this device but some other /dev/md* device). It should be a RAID level 1 (mirrored) but there is now only one disk in that array!

What to do?

Well, you need to restate the kicked out disk (in this case, /dev/hde5). There is a useful command to do this:
raidhotadd /dev/md2 /dev/hde5
(NOTE: you need need substitute your own correct devices. The above is an example only)

That will rebuild the dirty mirror disk from the main mirror disk. It will bring the RAID back to a fully flying 2-disk mirrored setup provided, of course, that the disk doesn't have a fault making it fail. While the rebuild is happening, you can monitor the rebuild by:
cat /proc/mdstat

It may be that your disk fails to join the araay and after raidhotadd completes, you see something like this:

# cat /proc/mdstat
Personalities : [raid0] [raid1]
read_ahead 1024 sectors
md2 : active raid1 hde5[0](F) hdf7[1]
39262720 blocks [2/1] [_U]
Note the (F) which means that the disk failed. Now hard drives are extremely reliable and it us unlikely that your disk is toasted (although you can always assume this to be safe). There is a great Linux command, badblocks that will scan your disk and mark off the bad blcoks on it. You can then safely add it back into the array. Please note though:
Only run this on unmounted disks

It takes a LONG time to run.

Simply run:
badblocks -f /dev/hd*
where /dev/hd* is the device name for your drive. In the example above this would be /dev/hde5. After the badblocks has run, try to raidhotadd the disk back into the array again.

You have to admit it: Linux is HOT!

Honeypot: spam@kieser.net

No comments:

Post a Comment