Handy Linux notes: Cron job to check for RAID disk failure

Linux's software RAID handling is fantastic, but how do you know if one of the disks have failed? RAID is designed to not have a single point of failure which means that if one of your disks goes West, you won't know about it. In a busy server environment you probably don't have the time to keep checking your Linux kit. Linux has a habit of running reliably for years and years until the hardware fails or you need to upgrade the system.

Well, fear not! Kieser.net to the rescue! We have this neat little script that does and elementary check for disk failure and then emails you if it detects a failure. You should install it on your server as root, and chmod 500 so that it is executable by cron. Of course, you also need to use crontab -e to make cron run it at a sensible frequency.

Here is the script for you to cut and paste into a suitable file:

#!/bin/bash

LOG_FILE=/tmp/raid_check_$$

SYSTEM=`uname --nodename`

MAILTO='root@kieser.net'

echo "The $SYSTEM system has RAID failures on it." >>$LOG_FILE

echo "Below is the output from /proc/mdstat" >> $LOG_FILE

echo "===========================================" >> $LOG_FILE

cat /proc/mdstat | egrep 'md.*raid' | fgrep -i '(f)' >> $LOG_FILE

if [ $? -eq 0 ]

then

cat /proc/mdstat >> $LOG_FILE

echo "===========================================" >> $LOG_FILE

mail -s 'URGENT: RAID disk failure detected' $MAILTO < $LOG_FILE

rm -f $LOG_FILE

exit 0

Honeypot: spam@kieser.net

Handy Linux notes

Tuesday, 19 January 2010

Cron job to check for RAID disk failure

No comments:

Post a Comment

Followers

Blog Archive

About Me