How to detect if HDD is corrupted

Hello every body,I am beginner in administration of SUN solaris and SUN hardware. I have question on how to detect if HDD on SUN box is corrupted. We are using SUN Fire V440 with Solaris 8.0.Thanks Peter
[224 byte] By [Peter_Svachoa] at [2007-11-26 13:35:18]
# 1
We really need more information on the issue. Are you looking at a hardware issue or software issue.The 2 basic lines of checking the HD would be "fsck" and/or perform a "format -> analyze -> read" test.
Lee_McCreerya at 2007-7-7 22:18:16 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 2

Thanks for your response.

I think I am looking for a software issue. As you wrote perform a "format -> analazy -> read" I did it. Analyze didn't back any bad block.

Maybe you are asking why do you thing that the HDD is corrupted.

I hope the HDD is not corrupted but on that HDD is Oracle Database that is corrupted.

But during few days on that database nothing do unordinary. One thing that I can write you is that every day is doing backup on the tape on this Sun box. One day this backup doesn't complete successfully. I find out that Sun box was restarted. But not me or anybody else.

I think there is any hardware problem with backup tape. Because I think it is not possible to reboot Sun box arbitrarily. And this could maybe corrupt database.

So because I want to know if HDD is corrupted or not.

Peter

Peter_Svachoa at 2007-7-7 22:18:16 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 3

Hello Peter,

I think it is not possible to reboot Sun box arbitrarily. And this could maybe corrupt database.

...

So because I want to know if HDD is corrupted or not.

Yes, you have to shutdown the database, otherwise the contents is damaged (becomes inconsistent/corrupt).

Depending on the database configuration (archivelog / no archivelog) and kind of transactions (mainly reads or high number of changes/writes) the amount of lost data/transactions varies.

Use one of the Oracle forums (http://forums.oracle.com). This is definetely an Oracle problem and less a hardware problem. Regarding the (unwanted) shutdown of the system, review the Solaris logs.

I would recommend that you hide your work e-mail address in the profile and use instead one of the several free-mailers. There are very few occasions when you will be directly contacted. In over 1500 postings I did that only 3-4 times. These forums are user-to-user, there is no e-mail support from Sun on these forums.

Michael

Thank you for taking the time to hide your e-mail address ...

Message was edited by:

MAALATFT

MAALATFTa at 2007-7-7 22:18:16 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 4

Thanks for your response Michael,

I agree with your suggestion about shutting down the database. But this isn't so big problem because we have export dump files and can make new database from this dump files. Also for your information this is only development environment.

So I don't think the problem is still on the database. The problem is on HDD.

I try to run fsck on that HDD.

Output is:

fsck /dev/rdsk/c3t11d0s0

** /dev/rdsk/c3t11d0s0

CANNOT READ: BLK 285548096

CONTINUE? y

THE FOLLOWING SECTORS COULD NOT BE READ: 285548096 285548097 285548098 285548099

Next I try to run fsck on another HDD.

Output is:

fsck /dev/rdsk/c3t10d0s0

** /dev/rdsk/c3t10d0s0

BAD SUPER BLOCK: MAGIC NUMBER WRONG

USE AN ALTERNATE SUPER-BLOCK TO SUPPLY NEEDED INFORMATION;

eg. fsck [-F ufs] -o b=# [special ...]

where # is the alternate super block. SEE fsck_ufs(1M).

I did search on Internet but nothing what can help me.

Thanks Peter

Peter_Svachoa at 2007-7-7 22:18:16 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 5

As you <Lee_McCreery > wrote perform a "format -> analyze -> read" I did it. Analyze didn't back any bad block.

This means this wasn't one of the malfunctioning drives.

Boot from cd into single-user mode

mount the /-slice of the bootdisk

review the file <mountpoint>/etc/vfstab. Keep the content as reference !

un-mount (the command is umount) this slice

Are these local disks (directly attached) or SAN disks ?

fsck all (used) slices, one after the other

fsck /dev/rdsk/c3t11d0s0

** /dev/rdsk/c3t11d0s0

CANNOT READ: BLK 285548096

CONTINUE? y

THE FOLLOWING SECTORS COULD NOT BE READ: 285548096 285548097 285548098 285548099

Use "format -> analyze -> read" to check this disk, but I would recommend to replace the disk. You can review the number of grown defects("format -> defect"), if there are any, this is an indicator for a failing disk.

fsck /dev/rdsk/c3t10d0s0

** /dev/rdsk/c3t10d0s0

BAD SUPER BLOCK: MAGIC NUMBER WRONG

Contents of this disk is very likely lost, use format to label the disk and restore data from tape backup.

Michael

MAALATFTa at 2007-7-7 22:18:16 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 6

Thanks Michael,

the disk are in the Sun StorEDGE Disk Array.

I did "format -> analyze -> read" to check this disk. After this I look on defect.

"defect -> primary - extract manufacturer's defect list" and the output is:

Defect List has a total of 932 defects.

When I use " both- extract both primary and grown defects lists " the output is:

Defect List has a total of 0 defects.

What do you mean by bold mark "Use "format -> analyze -> read" to check this disk, but I would recommend to replace the disk." You mean put the disk out from Disk array and put new disk?

It's important to boot from CD into single-user mode.

It's not possible to do fsck on unmounted disk?

Thanks Peter

Peter_Svachoa at 2007-7-7 22:18:16 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 7

Check the filesystems with fsck, /etc/vfstab lists the filesystems.

What do you mean by bold mark "Use "format -> analyze -> read" to check this disk.

Surface analysis. fsck checks only for inconsistencies for allocated space.

When I use " both - extract both primary and grown defects lists " the output is:

Defect List has a total of 0 defects.

Primary defects are found at the factury (these bad spots have already been remapped). Grown defects develop during operation.

Only review the grown defects.

You mean put the disk out from Disk array and put new disk?

Remove the disk and install a new one.

SCSI disk can automatically re-map bad spots to spare sectors. These bad sectors are added to grown defects and not re-used. If the number of grown defects increases, this indicates that the disk fails.

Yes, there is no need to boot from cd to check the unmounted filesystems.

These forums are user-to-user. If you want direct support, open a service case.

Michael

MAALATFTa at 2007-7-7 22:18:16 > top of Java-index,Sun Hardware,Servers - General Discussion...