Tuesday 19 January 2010

Cron job to check for RAID disk failure

Linux's software RAID handling is fantastic, but how do you know if one of the disks have failed? RAID is designed to not have a single point of failure which means that if one of your disks goes West, you won't know about it. In a busy server environment you probably don't have the time to keep checking your Linux kit. Linux has a habit of running reliably for years and years until the hardware fails or you need to upgrade the system.


Well, fear not! Kieser.net to the rescue! We have this neat little script that does and elementary check for disk failure and then emails you if it detects a failure. You should install it on your server as root, and chmod 500 so that it is executable by cron. Of course, you also need to use crontab -e to make cron run it at a sensible frequency.

Here is the script for you to cut and paste into a suitable file:


#!/bin/bash


LOG_FILE=/tmp/raid_check_$$
SYSTEM=`uname --nodename`
MAILTO='root@kieser.net'


echo "The $SYSTEM system has RAID failures on it." >>$LOG_FILE
echo "Below is the output from /proc/mdstat" >> $LOG_FILE
echo "===========================================" >> $LOG_FILE


cat /proc/mdstat | egrep 'md.*raid' | fgrep -i '(f)' >> $LOG_FILE


if [ $? -eq 0 ]
then
cat /proc/mdstat >> $LOG_FILE

echo "===========================================" >> $LOG_FILE
mail -s 'URGENT: RAID disk failure detected' $MAILTO < $LOG_FILE
fi

rm -f $LOG_FILE
exit 0




Honeypot: spam@kieser.net

No comments:

Post a Comment