When do bad blocks make it onto the grown defect list?

We have one Ultra320 Maxtor disk in a 4 disk stripe set that decided to start throwing hard read errors associated with 2 consecutive blocks. The disk is apparently not swapping these out for spare blocks or putting them on the "grown" defect list that format -> defects shows. Should it? I'm not clear about when Solaris does this, or if the disk should do it all by itself.

This is on a Sun dual ultra320 scsi controller. There are only disks on this channel.

fsck run on the mounted stripe set and it didn't pick up this bad block and

didn't cause it to appear on the grown list.

Ismartctl -t longrun on the underlying disk with the bad block picked up the problem block and reported (with a subsequent smartctl -a -d scsi that

the it had failed in segment

seg#=2, LBA_first_err=10cfbbd0, sk=0x3, asc=0x11, asq=0,0

but, you guessed it, it still didn't go onto the grown list.

Currently I'm running format-> analyze with the block transfer set to 1

(not the default 126, more about that below) with READ. Hopefully

that will finally map this block out. If not, I guess I'll have to add it manually but that seems so, um, primitive.

Anyway, I figured out which file was affected by running:

sum 'filename'

on every filename emitted by: find /vol02. Then I tried copying the

affected file to another location. Sure, it's got two bad blocks, but its

2^32 + 8096 bytes, and I'd like to keep the rest of it. Supposedly this

should have done it:

dd if=infile of=outfile conv=noerror,sync

Unfortunately, no. It started throwing the hard read errors many blocks upstream of the bad block but indicating the same bad block, these show up in /var/adm/messages with the initial block read and the failed block. I think the scsi driver is doing some sort of cluster block read for performance reasons.After it logged a bunch of those it stopped. It never did get past the bad block. I tried adding bs=512 (which should have been the default) but no joy, it still failed to transfer the data.

So that's why I'm running the analyze with a transfer size of 1 block. Hopefully this gets all the way down to the SCSI driver, and eventually sets the block in the grown list, but we'll see.

Thanks

[2329 byte] By [mathog] at [2007-11-26 9:01:48]
# 1

Just for completenes here is what smartctl shows about the drive now. The

LOGICAL UNIT FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d,ascq=2] is a bit misleading since I've been pounding on those

same two bad blocks while trying various ways to read the file that

contains it. Wish I knew how to clear that message, even if these

are the only two bad blocks this disk ever develops I suspect it

will be stuck with that message forever.

Smartctl doesn't know how to read the start/stop count on this

disk properly, it definitely has not been cycled that many times!

bash-2.03# smartctl -a -d scsi /dev/rdsk/c4t0d0s2

smartctl version 5.33 [sparc-sun-solaris2.8] Copyright (C) 2002-4 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

Device: MAXTORATLAS10K5_147SCA Version: JNZH

Serial number: D40T19PK

Device type: disk

Transport protocol: Parallel SCSI (SPI-4)

Local Time is: Tue Jul 25 12:12:03 2006 PDT

Device supports SMART and is Enabled

Temperature Warning Enabled

SMART Health Status: LOGICAL UNIT FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d,ascq=2]

Current Drive Temperature:31 C

Manufactured in week 05 of year

Current start stop count:1074003968 times

Recommended maximum start stop count: 1124401151 times

Error counter log:

Errors Corrected byTotalCorrectionGigabytesTotal

EEC rereads/errorsalgorithmprocesseduncorrected

fast | delayedrewrites corrected invocations[10^9 bytes] errors

read:2158171221 1 01755.392 220

write: 00 0 0 0248.4380

Non-medium error count:38

Last n error events log page

Error event 32768:

Error event 5669:

`

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

Error event 0:

<<short Last n error events log page>>

SMART Self-test log

Num Test Statussegment LifeTime LBA_first_err [SK ASC ASQ]

Descriptionnumber(hours)

# 1 Background longFailed in segment -->2 5667 0x10cfbbd0 [0x3 0x11 0x0]

# 2 Background short Completed- 5665- [---]

# 3 Background longCompleted-164- [---]

# 4 Background short Completed-164- [---]

Long (extended) Self Test duration: 2880 seconds [48.0 minutes]

mathog at 2007-7-6 23:08:41 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 2

Modern disks tend to invisibly remap bad blocks.

I don't think the defect lists are used any more.

If the disk has a block error on write, it just invisibly redoes the write to another block.

If you see write error passed up to the OS level, then the hardware has run out of remap blocks and your disk is very sick.

Read erors are harder. If it gets one it can't just read from somewhere else.

So reads don't remap blocks.

If you just want to remap the block, you could reformat the disk.

Or delete the file and write out more data.

But if you want the file back, then thats what backups are for.

If you need your system to be robust against read error you need to mirror.

As to how you can recover the non corrupt parts of the file.

I can only suggest try dd to read as much as you can before the errors.

Then do another dd with a skip factor to skip to the part after the error and keep reading from there.

robertcohen at 2007-7-6 23:08:41 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 3

> Then do another dd with a skip factor to skip to the

> part after the error and keep reading from there.

Actually I tried something like that and it didn't work as advertised (man page).Even with

conv=noerror,sync

dd wouldn't read through the bad part of this file. Instead it logged a bunch

of uncrecoverable read errors and quit. Only after

format -> analyze -> read

forced the bad blocks to be mapped out would dd work, at which point

the two empty bad blocks were not where dd reported them at block 3024

in the file, but rather at block 3200. I only found them because the oracle

"dbv" tool could tell where in the file it was corrupt to within 8092 bytes,

and then I was able to poke around with dd and my mdump program

(like od) to see the two zeroed blocks.The file lives on a 4 disk stripe

set and I assume this huge offset has something to do with the way the

stripe set works and the number of blocks per read for the SCSI driver.

mathog at 2007-7-6 23:08:41 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...