Sun Fire V215 does not boot up after power outage

Sun Fire V215 machine. The problem is that it does not automatically boot up to Solaris after a power outage.

I have referred to ALOM configuration manual at:

http://www.sun.com/products-n-solutions/hardware/docs/Servers/Workgroup_Servers /v215/index.html

SC variable: sc_powerstatememory is set to TRUE.

EEPROM parameter auto-bootis set to true.

The server stops at ALOM prompt "Please login:" after a power outage instead of booting all the way up and restoring the services. This is a serious issue for servers. I would think the default behavior would be this instead of stopping at ALOM. When I ran into the same issue with Netra-210 servers, I just had to set the sc_powerstatememory to TRUE.

I have tried simulating scenario with unplugging the cable

1. Connect the power supply cable.

2. ALOM begins to run.

3. Power on the system with either of the two ways (basically two test-cases)

a) SC command "poweron"

b) pressing the power-switch on the front panel

4. verify that the sc_powerstatememory is set to true using the scadm command.

5. Pull the power cable out. Put it back in.

I expect to see the system boot all the way up to Solaris multiuser mode and restore all services. This server stops at the ALOM prompt!

Any help is greatly appreciated.

thanks,

prakash

[1388 byte] By [SPF60a] at [2007-11-27 4:50:22]
# 1

sys_bootrestart: this variable should be set to "reset" to boot the solaris OS. There is no mention of this in the "diagnostic mode flowchart" in Openboot PROM enhancements manual.

Still no luck.

I should not have to do this configuration. The default behavior for servers should be "boot all the way up to Solaris".

Message was edited by:

SPF60

SPF60a at 2007-7-12 10:03:40 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 2

It is frustrating to do a lot of power-outage test-cases to try out various combinations of values for SC/EEPROM variables and to find where the problem is, not to mention having to do 'fsck' some times. Of course Sun boot process takes a million years even with "diag_level" set to "min".

Per all the manuals and other tips on the web, all I have to do is to change the value of the variable sc_powerstatememory from (default) FALSE to TRUE to boot to Solaris OS after a power outage. This actually works as intended on V210. Just not on V215!

Will post updates. There must be something else wrong with the system that it stops at ALOM prompt.

- SPF60

SPF60a at 2007-7-12 10:03:40 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 3

Perhaps this isn't an ALOM issue at all.

It's possible you have a customized variable(s) in OBP of that system.

Your Lights Out Management circuitry is quasi-independant of the rest of the system.

Telnet to the system and run eeprom.

What are the current settings for `auto-boot?` and for `error-reset-recovery` ?

Auto-boot should be at "true" and error-reset should be at the default of "sync".

http://www.google.com/search?hl=en&q=error-reset-recovery&btnG=Google+S earch

rukbata at 2007-7-12 10:03:40 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 4

> Perhaps this isn't an ALOM issue at all.

> It's possible you have a customized variable(s) in

> OBP of that system.

Thanks for the reply.

> Telnet to the system and run eeprom.

> What are the current settings for `auto-boot?`

> and for `error-reset-recovery` ?

Verified the following variables are what they should be.

Auto-boot is true.

Error-reset-recovery is sync.

Details of EEPROM:

# cd /usr/platform/`uname -i`/sbin

# ./eeprom

asr-policy=normal

test-args: data not available.

diag-passes=1

local-mac-address?=true

fcode-debug?=false

scsi-initiator-id=7

oem-logo: data not available.

oem-logo?=false

oem-banner: data not available.

oem-banner?=false

ansi-terminal?=true

screen-#columns=80

screen-#rows=34

ttyb-rts-dtr-off=false

ttyb-ignore-cd=true

ttya-rts-dtr-off=false

ttya-ignore-cd=true

ttyb-mode=9600,8,n,1,-

ttya-mode=9600,8,n,1,-

output-device=screen

input-device=keyboard

auto-boot-on-error?=true

error-reset-recovery=sync

load-base=16384

auto-boot?=true

network-boot-arguments: data not available.

boot-command=boot

diag-file: data not available.

diag-device=disk net

boot-file: data not available.

boot-device=disk net

use-nvramrc?=false

nvramrc: data not available.

security-mode=none

security-password: data not available.

security-#badlogins=0

verbosity=normal

diag-trigger=error-reset power-on-reset

service-mode?=false

diag-script=normal

diag-level=min

diag-switch?=false

Details of ALOM Config:

# ./scadm show

if_network="true"

if_modem="false"

if_connection="ssh"

if_emailalerts="false"

sys_autorestart="xir"

sys_bootrestart="reset"

sys_bootfailrecovery="none"

sys_maxbootfail="3"

sys_xirtimeout="900"

sys_boottimeout="900"

sys_wdttimeout="60"

netsc_tpelinktest="true"

netsc_dhcp="false"

netsc_ipaddr="0.0.0.0"

netsc_ipnetmask="255.255.255.0"

netsc_ipgateway="0.0.0.0"

mgt_mailhost=""

mgt_mailalert=""

sc_customerinfo=""

sc_escapechars="#."

sc_powerondelay="false"

sc_powerstatememory="true"

sc_clipasswdecho="true"

sc_cliprompt="sc"

sc_clitimeout="0"

sc_clieventlevel="2"

sc_backupuserdata="true"

sys_eventlevel="0"

> Auto-boot should be at "true" and error-reset should

> be at the default of "sync".

> http://www.google.com/search?hl=en&q=error-reset-recov

> ery&btnG=Google+Search

I am trying a few other things now (trying different values for)

- ALOM: sys_bootrestart

- ALOM: sys_autorestart

- EEPROM: diag-script

Will post updates.

- SPF60

SPF60a at 2007-7-12 10:03:40 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 5

That all appears to be configured properly.

As I type this (May '07) every V215 ever shipped is at least under warranty.

You can open a support case with Sun, and all it'll cost you is time waiting for the callback.

If there is contract coverage as well, then the callback time is quicker.

Sun support engineers have access to documentation that's not available

to those of us in this user-to-user discussion forum.

rukbata at 2007-7-12 10:03:40 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 6

Just logged a Sun service request.

thanks,

SPF60

Sorry everyone, there seems to be no end to my rant. The serial number is absolutely needed for logging a Sun service request online. Hostid is not acceptable.

The first problem is that the serial number is pasted on the bottom of the server. I didn't come across software tool that can read the serial number. There must be one out there or at least some way of getting it from openboot. I had to get the server out of the rack to look at the bottom.

That is where I run into second inconvenience. There is nothing but metal at the bottom and this serial number label happens to have the the tiniest letters on the planet. Is it 0, O or D? Is it 5 or 8 or B? It took a few attempts to get the serial number right.

SPF60a at 2007-7-12 10:03:40 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 7
Consider "SNEEP"ing the system. http://www.sun.com/download/products.xml?id=4304155aSerial Number into the EEProm
rukbata at 2007-7-12 10:03:40 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 8

Found at least one workaround. The diagnostic mode works.

The 'normal' mode is still not functioning as expected. (Normal mode is where you simply set the sc_powerstatememory to true and ship the system. System should boot to OS after power outage).

I tried the diagnostic mode. (eeprom diag-switch\?=true)

Then system automatically powers itself up after a power outage (pulling the cable out and putting it back in), runs through diagnostics and boots to Solaris OS. Boot time is almost doubled though. It might be a good workaround for now.

About "ASR" Automatic System Recovery:

EEPROM variable diag-switch? MUST be changed from the default value of FALSE to TRUE, to enable what Sun calls Automatic System Recovery.Here is what the documentation says.

"Because ASR relies on firmware diagnostics to detect faulty devices, diag-switch? must be set to true for ASR to run".

# eeprom diag-switch\?=true

diag-switch? =If true, "run in diagnostic mode,. After a boot request, boot diag-file from diag-device".If false, "run in non-diagnostic mode. After a boot request, boot boot-file from boot-device".

- SPF60

SPF60a at 2007-7-12 10:03:40 > top of Java-index,Sun Hardware,Servers - General Discussion...