Stale State Database Replicas

Hi,

I have a mirror setup between c1t2d0s0 and c1t3d0s0 as d30. Submirror has come up as being unavailable and the metadb on this slice was in an "unknown" state, with a metastat giving me a "Stale state database replicas" warning. I removed the stale metadb's using metadb -d and recreated them, but the disk c1t2d0s0 (submirror 0) is still unavailable. The production file system is currently mounted to the disk device/not to the mirror.

Any suggestions to fix the "unavailable" disk? Could i just remove the mirror and re-create it without loosing data on the mounted disk /dev/dsk/c1t3d0s0?

See output below:

--

# metastat

****

WARNING: Stale state database replicas. Metastat output may be inaccurate.

****

d30: Mirror

Submirror 0: d10

State: Unavailable

Submirror 1: d20

State: Okay

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 143237376 blocks (68 GB)

d10: Submirror of d30

State: Unavailable

Size: 143237376 blocks (68 GB)

Stripe 0:

DeviceStart Block DbaseState Reloc Hot Spare

c1t2d0s0 0No-Yes

d20: Submirror of d30

State: Okay

Size: 143237376 blocks (68 GB)

Stripe 0:

DeviceStart Block DbaseState Reloc Hot Spare

c1t3d0s0 0NoOkayYes

hsp001: is empty

Device Relocation Information:

DeviceReloc Device ID

c1t3d0Yesid1,sd@SSEAGATE_ST373307LSUN72G_3HZ9AA1N000075123ZGF

# metadb -i

flagsfirst blkblock count

au 16 8192/dev/dsk/c1t2d0s7

au 82088192/dev/dsk/c1t2d0s7

au 16 8192/dev/dsk/c1t3d0s7

au 82088192/dev/dsk/c1t3d0s7

r - replica does not have device relocation information

o - replica active prior to last mddb configuration change

u - replica is up to date

l - locator for this replica was read successfully

c - replica's location was in /etc/lvm/mddb.cf

p - replica's location was patched in kernel

m - replica is master, this is replica selected as input

W - replica has device write errors

a - replica is active, commits are occurring to this replica

M - replica had problem with master blocks

D - replica had problem with data blocks

F - replica had format problems

S - replica is too small to hold current data base

R - replica had device read errors

[2462 byte] By [pdaemon] at [2007-11-26 10:50:31]
# 1

You should probably metadetach the failed submirror.

Then remount the FS though the metadevice D30.

Then add the disk back into the mirror.

But its odd that its come up in a unavailable state.

Normally when there are problems the disk will show up as "needs maintenance".

Is the disk showing up in format?

Did you have any trouble readding the metadb.

Something that is worrying is the fact that none of your DB replica's are marked as a master.

I would try a reboot.

You might need to drop all your svm metadevices and recreate your mirroring from scratch.

robertcohen at 2007-7-7 3:03:08 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 2

Hi Robert,

Yes, the disk is showing up as seen below.

# format

Searching for disks...done

AVAILABLE DISK SELECTIONS:

0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>

/pci@1f,700000/scsi@2/sd@0,0

1. c1t2d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>

/pci@1f,700000/scsi@2/sd@2,0

2. c1t3d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>

/pci@1f,700000/scsi@2/sd@3,0

Specify disk (enter its number):

Unfortunately I can't reboot right now as this is a production machine. I didn't have any trouble reading metadb, but the metadb slice c1t0d0s7 showed:

au 16 8192unknown

instead of:

au 16 8192/dev/dsk/c1t2d0s7

which is when i removed those 2xmetadb's using metadb -d and re-created them.

The mirror is currently not mounted at all, just the second disk in the mirror set (c1t3d0). Is it fine then to recreate the mirror without affecting the running server?

Thanks for your help!

pdaemon at 2007-7-7 3:03:08 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 3
I get the following error when I try and detach the submirror:# metadetach d30 d10metadetach: petce01: stale databases# metadetach -f d30 d10metadetach: petce01: stale databases
pdaemon at 2007-7-7 3:03:08 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 4
Not looking good, unless someone more expert than I can suggest something, I can only recommend dropping all mirroring and recreating from scratch.And you can't drop the mirroring from / without a couple of reboots.
robertcohen at 2007-7-7 3:03:08 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 5
It's /usr/openv that's mirrored not /, so I shouldn't need any reboots for this mirror.Thanks for your help so far...
pdaemon at 2007-7-7 3:03:08 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 6

No, I meant *all* mirrors on the box. Not just that one.

Its your state databases that are the problem and they are SVM wide.

The procedure for dropping a root mirror is given here

http://sunsolve.sun.com/search/document.do?assetkey=1-9-76217-1

That docs is actually about resetting a root password, but the procedure is similar.

You just dont need to do the bit about editing /etc/passwd.

I'd wait a day or 2 to see if anyone else has a better idea. Then schedule a downtime for a convenient time.

robertcohen at 2007-7-7 3:03:08 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 7
I've only got the one software mirror, which is d30. The other mirror which holds the root partition is a hardware mirror:# raidctlRAIDRAIDRAIDDiskVolume Status DiskStatusc1t0d0 OK c1t0d0 OKc1t1d0 OK
pdaemon at 2007-7-7 3:03:08 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 8
Ok, that simplifies things.If you already mounting the disk through the raw device then delete all the metadevicesand metadb's and start from scratch...
robertcohen at 2007-7-7 3:03:08 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...