Sun Fire V215 does not boot up after power outage
Sun Fire V215 machine. The problem is that it does not automatically boot up to Solaris after a power outage.
I have referred to ALOM configuration manual at:
http://www.sun.com/products-n-solutions/hardware/docs/Servers/Workgroup_Servers /v215/index.html
SC variable: sc_powerstatememory is set to TRUE.
EEPROM parameter auto-bootis set to true.
The server stops at ALOM prompt "Please login:" after a power outage instead of booting all the way up and restoring the services. This is a serious issue for servers. I would think the default behavior would be this instead of stopping at ALOM. When I ran into the same issue with Netra-210 servers, I just had to set the sc_powerstatememory to TRUE.
I have tried simulating scenario with unplugging the cable
1. Connect the power supply cable.
2. ALOM begins to run.
3. Power on the system with either of the two ways (basically two test-cases)
a) SC command "poweron"
b) pressing the power-switch on the front panel
4. verify that the sc_powerstatememory is set to true using the scadm command.
5. Pull the power cable out. Put it back in.
I expect to see the system boot all the way up to Solaris multiuser mode and restore all services. This server stops at the ALOM prompt!
Any help is greatly appreciated.
thanks,
prakash
[1388 byte] By [
SPF60a] at [2007-11-27 4:50:22]

# 1
sys_bootrestart: this variable should be set to "reset" to boot the solaris OS. There is no mention of this in the "diagnostic mode flowchart" in Openboot PROM enhancements manual.
Still no luck.
I should not have to do this configuration. The default behavior for servers should be "boot all the way up to Solaris".
Message was edited by:
SPF60
SPF60a at 2007-7-12 10:03:40 >

# 2
It is frustrating to do a lot of power-outage test-cases to try out various combinations of values for SC/EEPROM variables and to find where the problem is, not to mention having to do 'fsck' some times. Of course Sun boot process takes a million years even with "diag_level" set to "min".
Per all the manuals and other tips on the web, all I have to do is to change the value of the variable sc_powerstatememory from (default) FALSE to TRUE to boot to Solaris OS after a power outage. This actually works as intended on V210. Just not on V215!
Will post updates. There must be something else wrong with the system that it stops at ALOM prompt.
- SPF60
SPF60a at 2007-7-12 10:03:40 >

# 3
Perhaps this isn't an ALOM issue at all.
It's possible you have a customized variable(s) in OBP of that system.
Your Lights Out Management circuitry is quasi-independant of the rest of the system.
Telnet to the system and run eeprom.
What are the current settings for `auto-boot?` and for `error-reset-recovery` ?
Auto-boot should be at "true" and error-reset should be at the default of "sync".
http://www.google.com/search?hl=en&q=error-reset-recovery&btnG=Google+S earch
# 4
> Perhaps this isn't an ALOM issue at all.
> It's possible you have a customized variable(s) in
> OBP of that system.
Thanks for the reply.
> Telnet to the system and run eeprom.
> What are the current settings for `auto-boot?`
> and for `error-reset-recovery` ?
Verified the following variables are what they should be.
Auto-boot is true.
Error-reset-recovery is sync.
Details of EEPROM:
# cd /usr/platform/`uname -i`/sbin
# ./eeprom
asr-policy=normal
test-args: data not available.
diag-passes=1
local-mac-address?=true
fcode-debug?=false
scsi-initiator-id=7
oem-logo: data not available.
oem-logo?=false
oem-banner: data not available.
oem-banner?=false
ansi-terminal?=true
screen-#columns=80
screen-#rows=34
ttyb-rts-dtr-off=false
ttyb-ignore-cd=true
ttya-rts-dtr-off=false
ttya-ignore-cd=true
ttyb-mode=9600,8,n,1,-
ttya-mode=9600,8,n,1,-
output-device=screen
input-device=keyboard
auto-boot-on-error?=true
error-reset-recovery=sync
load-base=16384
auto-boot?=true
network-boot-arguments: data not available.
boot-command=boot
diag-file: data not available.
diag-device=disk net
boot-file: data not available.
boot-device=disk net
use-nvramrc?=false
nvramrc: data not available.
security-mode=none
security-password: data not available.
security-#badlogins=0
verbosity=normal
diag-trigger=error-reset power-on-reset
service-mode?=false
diag-script=normal
diag-level=min
diag-switch?=false
Details of ALOM Config:
# ./scadm show
if_network="true"
if_modem="false"
if_connection="ssh"
if_emailalerts="false"
sys_autorestart="xir"
sys_bootrestart="reset"
sys_bootfailrecovery="none"
sys_maxbootfail="3"
sys_xirtimeout="900"
sys_boottimeout="900"
sys_wdttimeout="60"
netsc_tpelinktest="true"
netsc_dhcp="false"
netsc_ipaddr="0.0.0.0"
netsc_ipnetmask="255.255.255.0"
netsc_ipgateway="0.0.0.0"
mgt_mailhost=""
mgt_mailalert=""
sc_customerinfo=""
sc_escapechars="#."
sc_powerondelay="false"
sc_powerstatememory="true"
sc_clipasswdecho="true"
sc_cliprompt="sc"
sc_clitimeout="0"
sc_clieventlevel="2"
sc_backupuserdata="true"
sys_eventlevel="0"
> Auto-boot should be at "true" and error-reset should
> be at the default of "sync".
> http://www.google.com/search?hl=en&q=error-reset-recov
> ery&btnG=Google+Search
I am trying a few other things now (trying different values for)
- ALOM: sys_bootrestart
- ALOM: sys_autorestart
- EEPROM: diag-script
Will post updates.
- SPF60
SPF60a at 2007-7-12 10:03:40 >

# 5
That all appears to be configured properly.
As I type this (May '07) every V215 ever shipped is at least under warranty.
You can open a support case with Sun, and all it'll cost you is time waiting for the callback.
If there is contract coverage as well, then the callback time is quicker.
Sun support engineers have access to documentation that's not available
to those of us in this user-to-user discussion forum.
# 6
Just logged a Sun service request.
thanks,
SPF60
Sorry everyone, there seems to be no end to my rant. The serial number is absolutely needed for logging a Sun service request online. Hostid is not acceptable.
The first problem is that the serial number is pasted on the bottom of the server. I didn't come across software tool that can read the serial number. There must be one out there or at least some way of getting it from openboot. I had to get the server out of the rack to look at the bottom.
That is where I run into second inconvenience. There is nothing but metal at the bottom and this serial number label happens to have the the tiniest letters on the planet. Is it 0, O or D? Is it 5 or 8 or B? It took a few attempts to get the serial number right.
SPF60a at 2007-7-12 10:03:40 >

# 7
Consider "SNEEP"ing the system. http://www.sun.com/download/products.xml?id=4304155aSerial Number into the EEProm
# 8
Found at least one workaround. The diagnostic mode works.
The 'normal' mode is still not functioning as expected. (Normal mode is where you simply set the sc_powerstatememory to true and ship the system. System should boot to OS after power outage).
I tried the diagnostic mode. (eeprom diag-switch\?=true)
Then system automatically powers itself up after a power outage (pulling the cable out and putting it back in), runs through diagnostics and boots to Solaris OS. Boot time is almost doubled though. It might be a good workaround for now.
About "ASR" Automatic System Recovery:
EEPROM variable diag-switch? MUST be changed from the default value of FALSE to TRUE, to enable what Sun calls Automatic System Recovery.Here is what the documentation says.
"Because ASR relies on firmware diagnostics to detect faulty devices, diag-switch? must be set to true for ASR to run".
# eeprom diag-switch\?=true
diag-switch? =If true, "run in diagnostic mode,. After a boot request, boot diag-file from diag-device".If false, "run in non-diagnostic mode. After a boot request, boot boot-file from boot-device".
- SPF60
SPF60a at 2007-7-12 10:03:40 >

