Veritas Volume Manager errors

I am running a Solaris Ultrasparc Enterprise 450 Server with Solaris 8 operating system. This past weekend, I received the following errors relating to Volumen Management on our disks.

I need help in understanding what happened and how to fix it. The only thing I found was that 2-3 filesystems were over 90% and I was able to have files removed, and thus reduce the capacity by 20%. But I still would like to know what happened and how to prevent these errors from happening again.I am new to storage management.

"Failures have been detected by the VERITAS Volume Manager:

failed disks:

disk04

failed plexes:

vol02-02

vol04-02

vol06-P04

These volumes are still usable, but the the redundancy of

those volumes is reduced. Any RAID-5 volumes with storage on the failed disk may become unusable in the face of further failures.

VERITAS Volume Manager is preparing to relocate for diskgroup rootdg. Saving the current configuration in:

/etc/vx/saveconfig.d/rootdg.060312_032923.mpvsh

Relocation was not successful for subdisks on disk disk04 in volume vol04 in disk group rootdg. No replacement was made and the disk is still unusable."

Please advise. Thanks.

[1267 byte] By [DCAdmin] at [2007-11-25 23:03:41]
# 1
I suggest you pursue the message about the failed disk ... try <b>vxdisk list</b>
SimonM at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 2

Simon,

When I enter the vxdisk list command. I do see the following:

Device DiskStatus

c0t0d0s2 error

c0t1d0s2 error

c0t2d0s2 error

c0t3d0s2 error

c2t1d0s2 error

- - disk04failed was:c3t1d0s2

I did not configure this server, but I assume the c0t0d0s

devices were not set up because the other devices have disk01 through disk04 configured and all are online except disk04, which has the failed disk.

I am not familar with the Veritas Volume Management commands and how it works, so is there anything else I can do to resolve this issue? Thanks for your assistance.

DCAdmin at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 3

You would seem to have, at least, an issue with one of your disks. The ones that state 'just' error as their status are normally disks that have yet to be added into volume manager, so don't panic too much!

What will have happened, as has been alluded to by the messages displayed is that any volume that is defined as being on that disk has, potentially a fault. This is not really the place to go into a veritas tutorial but, in brief, a plex is an instance of your data (so two plexes for the same volume = mirroring), made up of sub-disks (which are allocations of disk space). VM normally hides all those nasty bits away from you when you create things.

If everything on that disk is defined as part of a RAID-5 set, then you should still be all ok, as you can replace the failed disk and Veritas should rebuild things - the documentaion on that is quite good.

To get a full idea of the state of paly, use something like: vxprint -g rootdg -thf

SimonM at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 4

Out of the 11 or 12 disk drives on the system in question, disk# 8 (c3t1d0) was labeled "unformatted", when I ran the format, analyze, and read commands. Does this mean the disk can be re-formatted or replaced? I will dig into the documenation in the meantime.

This is the result I get when I run the vxprint -g rootdg -thf command:

DG NAME NCONFIGNLOGMINORSGROUP-ID

DM NAME DEVICETYPEPRIVLEN PUBLENSTATE

RV NAME RLINK_CNTKSTATESTATEPRIMARY DATAVOLS SRL

RL NAME RVG KSTATESTATEREM_HOST REM_DGREM_RLNK

V NAME RVG KSTATESTATELENGTHREADPOLPREFPLEX UTYPE

PL NAME VOLUMEKSTATESTATELENGTHLAYOUTNCOL/WID MODE

SD NAME PLEX DISKDISKOFFS LENGTH[COL/]OFF DEVICEMODE

SV NAME PLEX VOLNAME NVOLLAYR LENGTH[COL/]OFF AM/NMMODE

DC NAME PARENTVOLLOGVOL

SP NAME SNAPVOLDCO

dg rootdgdefaultdefault 01043163330.1025.cstep2

dm disk01c2t2d0s2sliced471135358848 -

dm disk02c2t3d0s2sliced471135358848 -

dm disk03c3t0d0s2sliced471135358848 -

dm disk04----NODEVICE

dm disk05c3t2d0s2sliced471135358848 -

dm disk06c3t3d0s2sliced471135358848 -

v vol01-ENABLED ACTIVE24576000 SELECT-fsgen

pl vol01-01vol01ENABLED ACTIVE24577792 CONCAT-RW

sd disk01-01vol01-01disk01024577792 0 c2t2d0ENA

pl vol01-02vol01ENABLED ACTIVE24577792 CONCAT-RW

sd disk02-01vol01-02disk02024577792 0 c2t3d0ENA

v vol02-ENABLED ACTIVE15360000 SELECT-fsgen

pl vol02-01vol02ENABLED ACTIVE15361120 CONCAT-RW

sd disk03-01vol02-01disk03015361120 0 c3t0d0ENA

pl vol02-02vol02DISABLED NODEVICE 15361120 CONCAT-RW

sd disk04-01vol02-02disk04015361120 0 -NDEV

v vol03-ENABLED ACTIVE15360000 SELECT-fsgen

pl vol03-01vol03ENABLED ACTIVE15361120 CONCAT-RW

sd disk05-01vol03-01disk05015361120 0 c3t2d0ENA

pl vol03-02vol03ENABLED ACTIVE15361120 CONCAT-RW

sd disk06-01vol03-02disk06015361120 0 c3t3d0ENA

v vol04-ENABLED ACTIVE15360000 SELECT-fsgen

pl vol04-01vol04ENABLED ACTIVE15361120 CONCAT-RW

sd disk03-02vol04-01disk0315361120 15361120 0 c3t0d0ENA

pl vol04-02vol04DISABLED NODEVICE 15361120 CONCAT-RW

sd disk04-02vol04-02disk0415361120 15361120 0 -RLOC

v vol05-ENABLED ACTIVE15360000 SELECT-fsgen

pl vol05-01vol05ENABLED ACTIVE15361120 CONCAT-RW

sd disk05-02vol05-01disk0515361120 15361120 0 c3t2d0ENA

pl vol05-02vol05ENABLED ACTIVE15361120 CONCAT-RW

sd disk06-02vol05-02disk0615361120 15361120 0 c3t3d0ENA

v vol06-ENABLED ACTIVE15360000 SELECT-fsgen

pl vol06-03vol06ENABLED ACTIVE15360000 CONCAT-RW

sv vol06-S01vol06-03vol06-L01 110781056 0 2/2ENA

sv vol06-S02vol06-03vol06-L02 14578944 10781056 1/2ENA

v vol06-L01-ENABLED ACTIVE10781056 SELECT-fsgen

pl vol06-P01vol06-L01ENABLED ACTIVE10781056 CONCAT-RW

sd disk01-03vol06-P01disk0124577792 10781056 0 c2t2d0ENA

pl vol06-P02vol06-L01ENABLED ACTIVE10781056 CONCAT-RW

sd disk02-03vol06-P02disk0224577792 10781056 0 c2t3d0ENA

v vol06-L02-ENABLED ACTIVE4578944 SELECT-fsgen

pl vol06-P03vol06-L02ENABLED ACTIVE4578944 CONCAT-RW

sd disk03-04vol06-P03disk0330722240 4578944 0 c3t0d0ENA

pl vol06-P04vol06-L02DISABLED RECOVER 4578944 CONCAT-RW

sd disk05-03vol06-P04disk0530722240 4578944 0 c3t2d0ENA

DCAdmin at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 5

<b>dm disk04 - - - - NODEVICE</b> < concentrate on that!

That is your broken disk, and with the output from the vxdisk command you will be able to tell which physical disk it is/was.

With the exception of vol6 (which us using sub-volumes ... don't aks me, I have never used them!) you look to be ok - all volumes that have a DISABLED plex look to be mirrored (assuming I am reading the output correctly):

v vol02 - ENABLED ACTIVE 15360000 SELECT - fsgen

pl vol02-01 vol02 ENABLED ACTIVE 15361120 CONCAT - RW <b>< this is ONE copy of your data, let's call it the primary copy</b>

sd disk03-01 vol02-01 disk03 0 15361120 0 c3t0d0 ENA <b><-- and your primary copy sits on this sub-disk</b>

pl vol02-02 vol02 DISABLED NODEVICE 15361120 CONCAT - RW <b>< this is your other copy, let's call it the mirror, it is the one that is broken</b>

sd disk04-01 vol02-02 <i><b>disk04</i></b> 0 15361120 0 - NDEV <-- and this is the mirror on the sub-disk of the disk that is broken - note disk name

SimonM at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 6

Thank you so much Simon, you have been a great help. We will replace the broken disk and I have the documentation to assist. Of course, if I run into any goofy stuff, I will definitely come back here and get help. I am learning about Veritas as I go. I hope this will help others in similar situations. Thanks again.

DCAdmin at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 7
A pleasure - good luck and let us know how it goes.If it all goes swimmingly I want cookies, if it all drops in the pot I have never heard of you! <img src="images/smiley_icons/icon_wink.gif" border=0 alt="Wink">
SimonM at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 8
Simon........we have a deal LOLWe are trying to get a Sun to come and replace the disk drive and if they do come, I will make sure to observe them every step of way.
DCAdmin at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 9
Ok good - it is always nice to get 'the boys' in to do the hard bits! Do not be afraid to ask them what they are doing, and why. It also may be beneficial to get an idea of what they will, logically, be doing - to better phrase the questions!
SimonM at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 10

Simon or anyone else,

We could not get Sun to come in and help us out......for contract support reasons, you guys know how that goes <img src="images/smiley_icons/icon_smile.gif" border=0 alt="Smile">

I was able to obtain a spare drive, so I will have to do this on my own. I have been looking at some documentation. But I need some assurance that what I do will not cause more problems.

The only thing I did was take the disk (disk04) offline...using vxdiskadm and chose "Remove a disk for replacement" and also copied the /etc/vfstab file.

I guess what I am asking is there anything else I need to do to successfully replace the bad disk and make sure the server comes back up with its mount points. Please advise.

I plan to replace the bad disk this sunday morning. Suggestions are welcome. Thanks.

DCAdmin at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 11
I was able to successfully switch hard drives on my own with very few problems. Thanks for you all y our help.
DCAdmin at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...
# 12
I was about to say .. it should be fairly simple, as the Veritas manual guides you by the hand to a large degree (though it can be a leeeeeeetle unnerving to do some of the things!).So, well done and congratulations!
SimonM at 2007-7-5 17:55:30 > top of Java-index,Storage Forums,Storage General Discussion...