How to check that bootblocks are intact?
Since I'm rebooting servers like mad to prepare for the DST changes, I have been screwed a couple of times by hosts that would not reboot because the bootblocks were corrupted ("the file just loaded does not appear to be executable"). I did not install these hosts, someone else did, who knows how.
Is there way to check that the boot blocks are intact before a reboot?
A number of these hosts are running Veritas Volume Manager. Is it safe to just rerun installboot on every host as a preventative measure, even on hosts with VXVM? Maybe on just one side of the mirror to be safe....
[604 byte] By [
wsandersa] at [2007-11-26 19:53:16]

# 1
> Since I'm rebooting servers like mad to prepare for
> the DST changes, I have been screwed a couple of
> times by hosts that would not reboot because the
> bootblocks were corrupted ("the file just loaded does
> not appear to be executable"). I did not install
> these hosts, someone else did, who knows how.
>
> Is there way to check that the boot blocks are intact
> before a reboot?
Why not just run installboot? It's probably faster to copy the blocks than to try to run a comparison to see if they're good. Of course you're welcome to do that as well.
'installboot' is just a shell script. You can look at it to see what it does. It's quite simple.
> A number of these hosts are running Veritas Volume
> Manager. Is it safe to just rerun installboot on
> every host as a preventative measure, even on hosts
> with VXVM? Maybe on just one side of the mirror to be
> safe....
I'm not sure why running on one side is "being safe". If you want to be able to boot from either disk, I'd want the bootblocks on the root slice on both disks.
You might also read this thread from comp.unix.solaris:
http://groups.google.com/group/comp.unix.solaris/browse_thread/thread/7e7196416 32702f6
--
Darren
# 2
Thanks Darren,
Yeah, I figured on a "normal" Solaris system it would just be easier to rerun installboot. And I had never thought to examine installboot as as script - whaddaya know, there's no magic there. Thanks for helping me see the boot process more clearly.
So, if UFS is smart enough to not use blocks 0 through 15, what about swap? For some weird reason, some of these legacy hosts are using the "first" (cyls 0-XX) partition as swap and the "second" (XX+1 to YY) partition as root? That has never seemed to cause any problems. I guess the OS is smart enough to take care of it.
Still, a couple times this year Solaris has refused to boot with the dreaded "file does not appear to be bootable" error, under SVM, and both of the mirrors ere missing boot blocks. These are hosts with root in a UFS FS on cyl 0-XX. But installboot cleared this back up. In both cases I might have been at fault - I got in the bad habit of doing a "dd if=<disk1> of=<disk2>" and Ctrl-C-ing out. Dumb, but you would think it would just copy the bootblock along with the label. Eventually it bit me on the a** for other reasons when I evertually accidentally overwrote a label on a disk that was not quite the same geometry as the other. Not all SUN72G drives are the same - but I digress.
The big problem, which is not related to this forum, is we have a substantial number of legacy hosts running Veritas Volume manager, and this week's disaster was a very old, complicated legacy host running VXVM whose boot blocks were somehow trashed on both mirrors. I have forgotten the black magic VXVM uses to boot an encapsulated root disk. VXVM puts its private region on the first cylinder, in slice 3, and running installboot on that partition resulted in a disk that wouldn't boot, this time with a "Trap 3e" error, and probably a trashed private area. The VXVM troubleshooting guide explains a complicated procedure that basically involves restoring from tape, so if I'm going to have to restore from tape I'm going to install Solaris with SVM and forget about VXVM. I will never, ever, encapsulate a system disk with VXVM again.
- SVM is "free"
- SVM has never blown up on me
- SVM seems to be able to peacefully coexist with VXVM
- Recent version of Solaris overcome the 8 slice per disk limitation
- You can always back out by just booting an SVM root disk as a plain old disk, unless you've installed your OS on a RAID0 or 5 volume.
- Veritas support has fallen on "hard times", to put it politely.
# 3
> So, if UFS is smart enough to not use blocks 0
> through 15, what about swap?
Yes, it doesn't really need to, but it will do the same thing. Take a look with 'swap -l'.
# swap -l
swapfile dev swaplo blocksfree
/dev/dsk/c0t0d0s132,116 4194272 4194272
'swaplo' is 16, so it starts at block 16 on the device.
> For some weird reason,
> some of these legacy hosts are using the "first"
> (cyls 0-XX) partition as swap and the "second" (XX+1
> to YY) partition as root? That has never seemed to
> cause any problems. I guess the OS is smart enough to
> take care of it.
That doesn't matter anyway. The boot blocks are not on the blocks 1-15 of the disk, they're on blocks 1-15 of the root slice.
The VTOC is on block 0 of the disk.
> The big problem, which is not related to this forum,
> is we have a substantial number of legacy hosts
> running Veritas Volume manager, and this week's
> disaster was a very old, complicated legacy host
> running VXVM whose boot blocks were somehow trashed
> on both mirrors. I have forgotten the black magic
> VXVM uses to boot an encapsulated root disk.
There's no magic. It's the same as without VxVM to a point. There's a root partition on the disk, and that partition has the boo blocks in 1-15. You can run installboot on the root partition of both mirrored VxVM disks, just like you could do it on mirrored SVM disks.
> VXVM
> puts its private region on the first cylinder, in
> slice 3, and running installboot on that partition
> resulted in a disk that wouldn't boot, this time with
> a "Trap 3e" error, and probably a trashed private
> area.
As long as you pass the root partition to installboot, that shouldn't happen. The private region will not be part of the root partition. Now if you passed in /dev/rdsk/<blah>s2 and the private region was at the beginning of the disk, then that would be bad.
--
Darren
# 4
Thanks again! So - one could run installboot on /dev/vx/rdsk/rootvol?
But on all hosts I have, the offset (PLOFFS) of rootvol is 0, so a 'installboot bootblk /dev/rdsk/cXtXdXs4 - the VXVM "public region" - should work also? But it didn't in my case. or maybe I pooched something (like I did indeed installboot bootblk c0t0d0s2 or something.)
I'll have to test this, except these legacy servers are so old that we don't have licensing information for VXVM recorded anywhere.
I know this is probably documented somewhere (it does not seem to be at Veritas web site), but then assuming the boot blocks are installed on the first block of rootvol, this is physically on cylinder 1 of the disk, in the public region, slice 4. In nvramrc, vx-rootdisk is set to
disk@0,0:a, not disk@0,0:e, so I still dunno how it finds the bootblocks, unless there's some magic in the private region, which does start on block 0 of the disk.
I really appreciate the detailed replies, Darren. A pointer to a PSD or other Sunsolve doc would be sufficient. Surprisingly, I could not find much info about this topic via Google or on the Veritas web site.
# 5
Oh - I just noticed - we have some disks that are for lack of a better term "virgin post-encapsulation" - with "ghosts" of the original pre-encapsulation partitioning:
PartTagFlagCylinders SizeBlocks
0rootwm1453 - 1815512.06MB(363/0/0)1048707
1swapwu1 - 14522.00GB(1452/0/0)4194828
2backupwu0 - 2461933.92GB(24620/0/0) 71127180
3 -wu0 -01.41MB(1/0/0) 2889
4 -wu1 - 2461933.91GB(24619/0/0) 71124291
5 unassignedwm4720 - 58081.50GB(1089/0/0)3146121
So disk@0,0:a does point to the boot block location!
So, the mystery is solved, and the procedure described at http://seer.support.veritas.com/docs/246472.htm makes sense.
SOMEHOW, on the vtoc for the failed system, perhaps as a result of a hardware swap, the "ghost" partitions (including slice 0) got zeroed out, except for partitions 2,3, and 4 - death! I will
Would be interesting to see if I could have booted off disk@0:0:e in the nvramrc, although there woould have been no reference in the label to the root file system
Case closed. Thanks Darren for all your help.
# 6
> Thanks again! So - one could run installboot on
> /dev/vx/rdsk/rootvol?
Absolutely, or on /dev/rdsk/cxtxdxsx where that slice is the root slice. The rootvol and the rootslice must necessarily coincide. The difference is that writing to rootvol will write to both disks.
> But on all hosts I have, the offset (PLOFFS) of
> rootvol is 0, so a 'installboot bootblk
> /dev/rdsk/cXtXdXs4 - the VXVM "public region" -
> should work also?
If the offset is 0, then yes. You can see the same effect by giving that to fsck. Can you run 'fsck -n /dev/rdsk/cXtXdXs4'? If the filesytem is at block 16, then that works.
Certainly referring to the public region slice directly isn't recommended here. I'd prefer that we used either the volume or the root slice as entry points. Both of those are defined for this purpose.
Lets test it. I'm not going to write, but I will read...
First, the first few bytes of the bootblk file on my machine:
$ dd if=/usr/platform/SUNW,Ultra-30/lib/fs/ufs/bootblk ibs=1b count=1 | od -c | head -3
1+0 records in
1+0 records out
0000000 375 003J 331 \0 \0 0270 314 022 \t/pack
0000020ages 002 0044 024 \0 034 022 024Can'
0000040tfind/packages
Now the first few bytes of the underlying root filesystem slice (don't forget the one block offset handled by 'iseek')
$ dd if=/dev/rdsk/c0t0d0s0 ibs=1b iseek=1 count=1 | od -c | head -3
1+0 records in
1+0 records out
0000000 375 003J 331 \0 \0 0270 314 022 \t/pack
0000020ages 002 0044 024 \0 034 022 024Can'
0000040tfind/packages
And finally the same thing directed at the rootvol volume.
$ dd if=/dev/vx/rdsk/rootvol ibs=1b iseek=1 count=1 | od -c | head -3
1+0 records in
1+0 records out
0000000 375 003J 331 \0 \0 0270 314 022 \t/pack
0000020ages 002 0044 024 \0 034 022 024Can'
0000040tfind/packages
All the same data (the last two being the *same* data, with the first being a copy of it).
> But it didn't in my case. or maybe
> I pooched something (like I did indeed installboot
> bootblk c0t0d0s2 or something.)
That's the only way you're going to do bad things to the private region.
> I'll have to test this, except these legacy servers
> are so old that we don't have licensing information
> for VXVM recorded anywhere.
>
> I know this is probably documented somewhere (it does
> not seem to be at Veritas web site), but then
> assuming the boot blocks are installed on the first
> block of rootvol, this is physically on cylinder 1 of
> the disk, in the public region, slice 4. In nvramrc,
> vx-rootdisk is set to
> disk@0,0:a, not disk@0,0:e, so I still dunno how it
> finds the bootblocks, unless there's some magic in
> the private region, which does start on block 0 of
> the disk.
The public region is just an overlay. So your root filesystem will be in *three* separate slices. Slice 0 is common for root, slice 2 as the entire disk overlay, and then whatever slice the VxVM public region is in (4 if it's an initialized disk, whatever it could get on an encapsulated disk).
The OBP is booting slice 0 (that's the :a part above). Look at the partition on the disk with format or prtvtoc. You have a root slice on slice 0. Now the public region will overlap it, but that is irrelevant. The fact is that the rootvol will begin at the same point that the root slice begins on the disk. In fact you can use 'vxsdmkpart' to create a slice on the disk that maps exactly to an existing volume. When you mirror your root volume, VxVM uses that program to create a root slice on the mirror.
The OBP has to find the boot block. There's no OS, there's no VxVM. The only thing it knows how to do is read the VTOC from the disk, find the cylinder thats the beginning of the slice (:a or slice 0 in your case), and go to a 1 block offset to load and run those blocks. So VxVM isn't involved in that process at all.
The boot blocks have just enough smarts to find a UFS filesystem on that same slice and load the second level boot loader out of it. Again, no VxVM, no public/private region. Exactly the same as SVM to this point.
> I really appreciate the detailed replies, Darren. A
> pointer to a PSD or other Sunsolve doc would be
> sufficient. Surprisingly, I could not find much info
> about this topic via Google or on the Veritas web
> site.
I don't know why Sun would be particularly interested in documenting it.. But there are some older blueprints on the sun online blueprints site that go into some of the details. Poke around there and I'm sure they'll turn up pretty quickly.
--
Darren
