Solaris Cluster Problem
Guys,
My Sun cluster running an SAP R/3 system is giving me some trouble. Our SAP system and Oracle database running on out Sun machines went down and the cluster seems to have stopped responding. I am unable to log onto the Sun Plex manager. Although I can still telnet into both machines Light-Out manager addresses, when I do a "console -f", it just hangs. The curser keeps on blinking but it does not give me the command prompt. I am therefore unable to even run any diagnostics, let alone shutdown or restart the cluster or machines.
Would anyone have any suggestions as to what I can do here or is there any other way to run a diagnostics on the system.
Regards,
Feroz
[705 byte] By [
fbuksha] at [2007-11-26 21:48:30]

# 1
Feroz,Can you give more details? Are you able to ping the machines? You are mentioning that the systems went down - do you mean they have panicked ?You can give the break command in the console to bring the systems to ok prompt.Madhan Kumar
# 2
Feroz,
Probably a bit late now, but I would have looked at the console history that you can get from within the ALOM and see if there are any tell tale messages. May be you've run out of swap and/or some processes are locked in a tight loop consuming CPU.
if they really are wedged, then forcing a crashdump may be your only option for getting some insight into the cause of the problem.
Tim
# 3
Hi guys,
I have been able to determine that some of the resouces in my Sun cluster are stuck. Some are in the "pending offline" status, while others are trying to stop or start. Managed to get this information by doing a telnet into the cluster itself and doing an "sctat -g". Attempts to restart the cluter have been unsuccessful. It gives the following error message. The cluster has been like this for quite a few days now.
scshutdown: Could not shutdown all resource groups in the cluster: prd-sap-trans: resource group is undergoing a reconfiguration, please try again later.
Regards,
Feroz