Solaris 9 x86 v20/v40z systems - Monitoring RAID
Hello,
I'm wondering if anyone has any suggestion on how to monitor the status of your raid arrays in solaris 9 with the sunfire v20z/v40z using the build in controllers on these machines.
I'm finding that every 3 months or so i'm losing a drive in one of my systems. Presently I've taken to just running raidctl on every system on a regular basis and checking the results. This doesn't work all that well.
Several things i've notice with this partiuclar issue, the drives never turn on any fault lights, the syslog output sometimes can help you find out when the problem occured sometimes not.
Some ways i've been looking at doing this:
Running a script every hour to scan the messages file for degraded and email the results to my mailbox.
Running a script every hour and scanning the results of raidctl and initating an snmp trap that will get sent to my monitoring system.
Best option: A solution that already exists in the OS / Platform that for some reason is not enabled for whatever reason and I'm just not aware of it?
-Scott
[1101 byte] By [
Locutus233] at [2007-11-26 11:03:06]

# 1
I ran into the same thing, but i never managed to get raidctl checking the status of that particular controller.
Currently i have no monitoring on our RAID systems, but they are configured so that they'll start beeping if the loose a disk, that was the best solution i could find..
.7/M.
# 2
If your raid array is working and you have a recent version of raidctl installed it should show the following when type raidctl:
bash-2.05# raidctl
RAIDRAIDRAIDDisk
Volume Status DiskStatus
c1t0d0 OK c1t0d0 OK
c1t1d0 OK
bash-2.05#
(Note: this is from a V20z)
This would indicate that raid is working on the system. If it says anything else I've always assumed that the raid array is not working. I've see a few instances where setting up the raid array in the bios and then executing raidctl from command line didn't indicate a raid array at all. Turns out that the version of raidctl was old and needed patching on those systems or the system needed a restart for some reason or other.
# 3
Also what do you have to do to get the system to beep when there is a failed disk, i can't find anywhere in the bios of the system or the bios of the raid controller that has anything about fault detection settings...
One thing that is really puzziling to me is that all my IBM & Dell servers all beep and light up the fault light on the failed disk when there is a bad disk, however these sunfires don't make a sound and never triger a fault light. The guy who services these machines for sun in our area also told me he has almost never seen the fault light come on for a failed disk on any machine, just wondering if anyone else has had a simular experiance with these machines?
-Scott