SunMC agent core dump but has a limit of zero and wont actually core dump
Greetings,
Before I put a call in with Sun, I thought I would find if anyone else has a similiar problem. I have SunMC 3.5.1 running on a number of platforms and generally it is quite stable. I do have a number of SunFire 1280's (Netra T-12 to SunMC) each with 3 system boards with 96 GB of RAM. These systems are worked pretty hard and about every day or so, each agent on each system crashes. The problem is that there is a limit of zero for the agent core dump and it wont produce a core dump. Other processes dump with no problems. I gather they may be some limit in one of the SunMC config files but I don't know where to look. Without a core dump, it makes life hard to work out what causes it. I am only running the basic module monitoring so nothing else is loaded. If the basic module wont stay up, then why spend money to but in advanced monitoring?
Any info would be appreciated. I have worked with the fabulous Sun team regarding SunMC and I find months waiting for an answer somewhat frustrating.
Regards
Stephen
# 1
Sounds mysterious. Any information in the agent log files? A common statement prior to each agent dying might provide some insight into the root cause, with/without a core dump.
If you have a non-production staging area, you could explore bringing things up to the latest 3.5.1b - see patch 118388-07 on sunsolve for some of the bugfixes this addresses.. I had a quick look and didn't see any agent dying stuff.. but there was a lot there, so I may have missed it.
Ahah. 119640-02 applies to 3.5u1 and has 'agent process dies' in the description. Maybe worth a try.
iMac
ian.macdonald@halcyoninc.com
# 2
Hi Ian,
Thanks for the info. I will have a look at the patches. I noticed 3.5.1b has been released but as per usual, Sun does not really put much effort into saying what is good about it and why we should make the effort for upgrading. I could not wait to move from version 3 to 3.5 as it was so hard to manage patchwise.
One other thing that annoys me no end is the SunRay integration (/opt/SUNWut/sbin/utsunmc) alarms that appear. Once or twice, when I try to acknowledge the alarm, it just hangs. The event log says that I am trying to acknowledge an event that does not exist.
But then again, it has to be good because it does not cost anything other than time and the millions of dollars we spend on Sun equipment so we can use SunMC. That much money is better spend on hardware then on Tivoli right? I think I can get a base model 25k for less than implementing Tivoli.
Regards
Stephen
# 3
Hey Stephen,
I agree. 3.5.1b was mostly (I think) to handle the integration of the Solaris Container Manager which enables Solaris 10 zone mangement by way of the SunMC agent.Riding the wave of Solaris 10, it has empowered SunMC developments/improvements.
The SunMC team have really been making tracks in terms of releasing patches since 3.5; I know that the SunRay and StorEdge integrations are two areas that could use some updates and/or tighter integration. I hope they roll some of these peripheral pieces right into the base SunMC modules, and/or bring them up to speed.
Well, if somehow you end up with both an E25k and Tivoli, be sure to drop the <1% $$ on the Halcyon integration :)
cheers,
iMac.