? Prevent e4500 from booting caused by memfault ?

Hello,

we're running two e4500 with 12 cpu and 12 gbyte ram. Is there a way to prevent the sun to reboot after a memory error?

The system-file is configured to allow detachment of modules:

set pln:pln_enable_detach_suspend=1

set soc:soc_enable_detach_suspend=1

set kernel_cage_enable=1

memory-interleave is set to minimum.

The prefered action would be the system disable the cpu / memory in question and just core-dump the process(es) concerned.

Didn't find a thread in the forum. Sorry if it's a bogus question.

Kind regards

Denis

[601 byte] By [goldrauscha] at [2007-11-26 12:33:45]
# 1

What OS?

And by "prevent booting", I think you mean you don't want it to go down, not that you don't want it to come back up, right?

The problem is that with older versions, the OS has no way to know if any particular process on the machine is critical for operations or not. All it knows is that some process had to be forcefully killed. The safe way out is to reboot and assume that on the way back up, all critical processes are restarted.

With Solaris 10, SMF allows you to describe what should happen to a service's process that is killed by a hardware fault. So the machine as a whole should not have to be brought down.

--

Darren

Darren_Dunhama at 2007-7-7 15:48:27 > top of Java-index,General,Sys Admin Best Practices...
# 2
unfortunately it's still Solaris 9...and yes, I don't want the machine to go down. If it has been down, coming up is a good idea.Is it possible to tell the OS to do some "clean up", e.g. shutdown database processes on Solaris 9 before going down?
goldrauscha at 2007-7-7 15:48:27 > top of Java-index,General,Sys Admin Best Practices...
# 3
> Is it possible to tell the OS to do some "clean up",> e.g. shutdown database processes on Solaris 9 before> going down?That's what /etc/rc?.d/K* scripts are for. Somehow that just looks wrong.alan
alan.paea at 2007-7-7 15:48:27 > top of Java-index,General,Sys Admin Best Practices...