Hardware error on disk
Hello,
3 weeks ago I notice the following message on a server:
Oct 3 05:22:16 m530e scsi: [ID 107833 kern.warning] WARNING: /pci@1c,600000/scsi@2/sd@2,0 (sd2):
Oct 3 05:22:16 m530eError for Command: write(10)Error Level: Retryable
Oct 3 05:22:16 m530e scsi: [ID 107833 kern.notice]Requested Block: 7961728Error Block: 7961728
Oct 3 05:22:16 m530e scsi: [ID 107833 kern.notice]Vendor: SEAGATESerial Number: 053632G895
Oct 3 05:22:16 m530e scsi: [ID 107833 kern.notice]Sense Key: Hardware Error
Oct 3 05:22:16 m530e scsi: [ID 107833 kern.notice]ASC: 0x44 (internal target failure), ASCQ: 0x0, FRU: 0xb
The error has not repeated. The scsi address corresponds to a submirror but metastat shows everything is OK.
What happaned? Is there a way to test if the disk is ok? What about that block number on the message? Maybe it's just a bad block and has been mark as such (?)
Any thoghts are greatly appreciated.
[974 byte] By [
BillyP] at [2007-11-26 10:53:25]

# 1
Thats a soft error, they're not too serious.
Do an iostat -En | grep -i soft.
That will show you hard and soft errors. I wouldnt worry about it unless your getting hard errors
or your getting a lot of soft errors. Lots of soft errors might indicate scsi bus cabling issues etc.
# 3
One single, solitary hard error ?
It's not serious. Don't fret over it until you see hundreds of errors on the same block.
There are no perfect disk drives.Never have been and there won't be any for the forseeable future.
If you consider this a critical concern, then unmount the filesystem,
and use FORMAT to go in and mark the block "bad".
You can do that with SCSI and FCAL disks, but cannot do that on IDE drives.
The man pages for FORMAT can give you guidance on how to do that.
# 4
The term soft doesnt mean software.
Both hard and soft errors are being reported by the disk.
A soft error is a retryable error. It could mean a write failed so was written to a spare failover block instead.
A hard error is more serious.
The counters reset when machine is rebooted.
But if they keep appearing, its worth reporting or replacing the disk.
I don't think I'd wait for hundreds. An recurring hard error is a problem.
But I don't normally muck around with marking blocks bad. Modern disks should do that automatically.