Enterprise 250 Boot Problem

I have an Enterprise 250 server that refuses to boot. It stopped responding and when I got to the console, it looked like it had attempted to reboot and was unable to. Since then, I've tried getting it to boot both from the disks and from a CD and run into the same problem. Auto-Boot is turned off so it comes up to the "Ok" prompt. If I put in the CD 1 of 2 of the Solaris 8 (that's the OS currently on the server) and then try "boot cdrom -s", I get the "Boot Device: /pci...." line, and a spinning cursor. It then hangs there. The same thing happens if I try just "boot -s"

Once, during my boot attempts, it actually tried to boot from disk and I got the "Solaris 5.8 ..etc " banner and then it gave me an error of pci@1F, 4000/scsi@3 SCSI Bus reset" and said that /dev/dsk/c0t0d0s0 was unreadable.

probe-scsi shows all six disks on the list, all POST checks are fine, and test-all does not print any errors.

Does anyone have any idea what might be wrong here as I'm kind of stumped since I can't get in and look at anything at all. Since I can't seem to even boot from a CD, then even a re-install and restore from tape doesn't seem feasible. Do I just take it out back and shoot it or what? :)

[1231 byte] By [AaronSmitha] at [2007-11-26 13:33:07]
# 1

Hello Aaron,

capture the output (for your reference) of

devalias

printenv

then enter

set-defaults

setenv auto-boot? false

boot cdrom -sv

The v option is for verbose display.

If the boot from CD succeeds, review your boot disk with format/analyze(any non-destructive test).

Michael

MAALATFTa at 2007-7-7 22:12:20 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 2

Aaron,

It may also be time to give that venerable old soul some considerate maintenance.

I'm thinking it may be worth some time to get your hands inside it

and remove/re-install most major components.

Electrical contact points can oxidize with time.Oxidation impairs conductivity.

The simple act of reinstalling the cpu modules, every DIMM, each disk,

and all major cable connectors will make better metal-to-metal contact.

You mentioned the E250 seemed to slow its bootup at the point

where the "baton" is twizzling around.If I recall correctly, the computer

would be testing RAM and probing various data busses at that point.

Thus, no surprise that you also saw that SCSI bus reset error.

If there were timeouts for an inquiry along a data path,

particularly if trying to get to a boot drive, it could express itself as ...

"I can't get there from here, so I'm going to complain."

( .... just my two cents, anyhow ...)

--

Edit:

Give Michael the Duke dollars.

He very much deserves to reach the Silver level of stardom.

rukbata at 2007-7-7 22:12:20 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 3

Ok. Came in today to work some more on this server. Powered it up and at the OK prompt, I tried a simple boot -vs. Lo and behold, it starts to actually BOOT. I watch it, pen in hand, for any errors. It seems to go ok, sees the disks, corrects a couple of incorrect block counts on a couple of file systems, and then throws a total fit when it comes to /dev/md/rdsk/d1. Says Bad Inode Number for '..", Unexpected INconsistency, Run Fsck manually. I'm thinking "Ok, I can do that, but then it throws that error up a few more times (in rapid succession too fast for me to jot down anything more) and then reboots. It comes up and attempts to boot with the "boot" command, and hangs at the twirling baton (which stops twirling...:) ). so I give it a little while, just in case, then power it down, back on, and attempt a boot cdrom -vs. It's prints a "Size: ...." message and then hangs again.

I'm going to crack the case open and reseat some things to see if that helps with getting it booted on a CD at all so I can fsck that file system. I have to admit that I'm more of a linux admin than Solaris so I'm a little bit out of my element. I'm assuming that the md/d1 disk is some sort of Solaris raid volume or virtual disk. Will booting from the CD in single user mode make those disks available for fsck'ing? I certainly hope so...

AaronSmithKa at 2007-7-7 22:12:20 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 4

BAh. Opened the case, disconnected and reconnected every cable I could find in there that was connected to the SCSI backplane. Put everything back togethor, powered it up, did a "boot cdrom -vs" and nothing. Frozen spinny cursor again.

Now, I had called Sun tech support about this server but since this machine was due to be decommissioned, the maintenance/support contract was allowed to expire. Is it possible to purchase a new contract in order to have maintenance done on this machine? The person I spoke to on the phone said that time and materials to come fix it would be a minimum of $2,500.

In the meantime, I'm restoring the data from a recent backup into a directory on another server so I can at least have the option of restoring the various services (web, Mysql databases) that were still running on this machine to some other server.

AaronSmithKa at 2007-7-7 22:12:20 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 5

Hello Aaron,

I'm assuming that the md/d1 disk is some sort of Solaris raid volume or virtual disk.

Yes, the disks are under Solstice Disksuite control.

Boot from cd into single-user mode, mount the /-slice of the bootdisk. Then review the file /etc/vfstab.

Hopefully the previous owner left the original entries (pre-miroring) as comments, otherwise you have to use format to figure out how the slices are assigned (slice 0 is usually /, slice 1 swap, slice 2 special = the entire disk/don't touch).

At the ok-prompt

printenv boot-device

devalias

boot cdrom -s

mount -F ufs /dev/dsk/c0t0d0s0 /mnt

cat /mnt/etc/vfstab

umount /dev/dsk/c0t0d0s0

Happy New Year !

Michael

P.S.: If you forgot the password for the user AaronSmith, use the Forgot your password ? button on the Login page. The password will be reset and a new one send to you. But maybe the other account is tied to another e-mail address which is unreachable until next year ...

Update

... back togethor, powered it up, did a "boot cdrom -vs" and nothing. Frozen spinny cursor again.

Did you revert to the default settings ? This might fix your problem.

set-defaults

Is it possible to purchase a new contract in order to have maintenance done on this machine? The person I spoke to on the phone said that time and materials to come fix it would be a minimum of $2,500.

I got my E250 for about $60 (no disks, no cpus, single power-supply). Try to fix the system yourself or buy a second-hand system.

Message was edited by:

MAALATFT

MAALATFTa at 2007-7-7 22:12:20 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 6

I have no idea what's up with my forum account. It wouldn't accept my password, when I tried to reset it it said I didn't have a security question. So I entered the other one but when I try to log into THAT one just now it told me to re-enter a new screenname so I had to enter a THIRD one. Wierd.

I'll try resetting the defaults. I've been assuming that all of those are correct, but lord only knows when this machine was last rebooted. It certainly hasn't rebooted since I started here a year ago.

I did get a copy of the vfstab off the backup tape and took a look at it. No comments at all but I didn't expect there to be any.

AaronSmithK2a at 2007-7-7 22:12:20 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 7

Ok. So yet another screen name. I want to get back into the original post so I can award the Duke stars!! Everytime I log in with my Sun Online Account and then go to the forums, I get a page that says "We are consolidating our database yadda yadda yadda" and forces me to select a screen name. But if I put in any of my OLD screen names, it says that ones taken. I'm not sure what the heck I'm doing wrong here.

As for the server we've declared it dead. I tried resetting the defaults at the ok prompt with no love. looking through printenv showed pretty much all the settings were at the defaults anyway with the only exception being the auto-boot? setting. Everything that was running on that machine is now running elsewhere. Thanks for your help and if I can figure out what the heck is going on with my online account, I'll send the duke stars your way.

AaronSmithK3a at 2007-7-7 22:12:20 > top of Java-index,Sun Hardware,Servers - General Discussion...