ILOM on X4x00 - temperature monitoring snmp, automatic shutdown

Hello,

we run some X4100/X4200 servers. I want to monitor temperatures via ILOM's snmp interface. Unfortunately I get all kind of information with snmpwalk like sensor typ, hysteresis, unit, .... except the current sensor readings.Did anybody succeed in reporting temperatur sensor readings via snmp with ILOM ? We run ILOM firmware 1.0, build 6464.

Additionally I need to verify that some automatic shutdown happens once a system gets too hot. To test I would like to lower the temperature sensor threshold. How can I do this,how can I lower the temperature thresholds in ILOM?

Any help is greatly apreciated, with kind regards,

Heinrich

[686 byte] By [hebheb] at [2007-11-26 9:31:08]
# 1

I'm trying to get thermal shutdowns to happen, too, for a cluster of X4100s managed with N1SM. I don't use SNMP, though. I've found that the ILOMs have good support for IPMI. e.g.

echo changeme > changeme

ipmitool -u root -f changeme -H 192.168.1.20 sensor list

I can set the sensor thresholds with ... sensor thresh fp.t_amb unr 33

Thresholds set this way don't stick across power-cycling the ILOM, so I'll probably set up a cron job to set them.

The machine does a hard shutdown a few seconds after a sensor exceeds it's non-recoverable threshold. It never seems to shut down or signal the OS with ACPI or anything when the critical or non-critical (warning) thresholds are exceeded. It does log it in the sensor log, and I think sends an IPMI alert, since N1SM configures the master machine as an alert target. (in the ILOM cli, show /SP/alerts, or something).

ipmitool(1m) says that the soft shutdown command is done by having ACPI signal the OS that there is a "fatal overtemperature". I had assumed that that was what N1SM used to soft-power-off machines, but it seems that ipmitool ... power soft doesn't actually do anything to Solaris 10u1 amd64. So that's what I need to fix, I guess. I could try booting GNU/Linux from a CD to maybe see the ACPI events and make sure they're really being sent...

Anyway, I'd love to hear how to get Solaris to soft shutdown on critical temps. If I don't figure it out soon, I'll just lower the "non-recoverable" thresholds, since I'm just running grid engine on nodes provisioned from a flash archive, so I don't need to worry too much about clean shutdowns.

pcordes at 2007-7-7 0:16:04 > top of Java-index,Sun Hardware,Servers - General Discussion...