ILOM on X4x00 - temperature monitoring snmp, automatic shutdown
Hello,
we run some X4100/X4200 servers. I want to monitor temperatures via ILOM's snmp interface. Unfortunately I get all kind of information with snmpwalk like sensor typ, hysteresis, unit, .... except the current sensor readings.Did anybody succeed in reporting temperatur sensor readings via snmp with ILOM ? We run ILOM firmware 1.0, build 6464.
Additionally I need to verify that some automatic shutdown happens once a system gets too hot. To test I would like to lower the temperature sensor threshold. How can I do this,how can I lower the temperature thresholds in ILOM?
Any help is greatly apreciated, with kind regards,
Heinrich
[686 byte] By [
hebheb] at [2007-11-26 9:31:08]

# 1
I'm trying to get thermal shutdowns to happen, too, for a cluster of X4100s managed with N1SM. I don't use SNMP, though. I've found that the ILOMs have good support for IPMI. e.g.
echo changeme > changeme
ipmitool -u root -f changeme -H 192.168.1.20 sensor list
I can set the sensor thresholds with ... sensor thresh fp.t_amb unr 33
Thresholds set this way don't stick across power-cycling the ILOM, so I'll probably set up a cron job to set them.
The machine does a hard shutdown a few seconds after a sensor exceeds it's non-recoverable threshold. It never seems to shut down or signal the OS with ACPI or anything when the critical or non-critical (warning) thresholds are exceeded. It does log it in the sensor log, and I think sends an IPMI alert, since N1SM configures the master machine as an alert target. (in the ILOM cli, show /SP/alerts, or something).
ipmitool(1m) says that the soft shutdown command is done by having ACPI signal the OS that there is a "fatal overtemperature". I had assumed that that was what N1SM used to soft-power-off machines, but it seems that ipmitool ... power soft doesn't actually do anything to Solaris 10u1 amd64. So that's what I need to fix, I guess. I could try booting GNU/Linux from a CD to maybe see the ACPI events and make sure they're really being sent...
Anyway, I'd love to hear how to get Solaris to soft shutdown on critical temps. If I don't figure it out soon, I'll just lower the "non-recoverable" thresholds, since I'm just running grid engine on nodes provisioned from a flash archive, so I don't need to worry too much about clean shutdowns.