question regarding "Agent Statistics Number of processes > 15"
Hi People.
I have a general question regarding an alarm that has had its threshold breached on a number of our servers. Below is some investigation.
The alarm is generated by Local Applications > Agent Statistics > Sun Management Center Total Child Process Statistics.
The default threshold is set to 15 which is out of the box. However some of our hosts have 30 of these processes.
Below is a sample from the console. Can someone please explain why SUNMC would need to spawn all these child processes. Not to mention why these processes do nothing by SLEEP, as my truss output indicates.
Is this some memory issue that has occurred with SUNMC, or something to do with SUNMC not being able to clean its sub processes.
Any help or clarification on this matter would be greatly appreciated.
Thanks in advance.
===================================
[MYHOST]/# ps -ef | grep symon
root 257091 1Mar 14 ?1649:39 esd - init agent -dir /var/opt/SUNWsymon -q
root 20332 20329 0 15:03:45 pts/20:00 grep symon
[MYHOST]/# ptree 25709
25709 esd - init agent -dir /var/opt/SUNWsymon -q
25905 sh
25904 sh
27063 esd - shell perftool-shell.tcl
27093 sh
27091 sh
29213 sh
29211 sh
18218 sh
18205 sh
18216 sh
18227 sh
[MYHOST]/# truss -p 18227
.read(0, 0x000394D8, 128)(sleeping...)
===================================
[1483 byte] By [
katsal] at [2007-11-26 8:07:24]

# 1
Hi Katsal,
> The default threshold is set to 15 which is out of
> the box. However some of our hosts have 30 of these
> processes.
This default limit is rather low, and in my opinion should be set to around 50 out-of-the-box. As you load more modules this number will go up... which is normal... and the default limit of 15 doesn't take that into account.
> Below is a sample from the console. Can someone
> please explain why SUNMC would need to spawn all
> these child processes. Not to mention why these
> processes do nothing by SLEEP, as my truss output
> indicates.
>
> Is this some memory issue that has occurred with
> SUNMC, or something to do with SUNMC not being able
> to clean its sub processes.
This is normal: they're called "captive shells". Many SunMC modules run shell commands to acquire data. For performance reasons each module will keep a couple shells open to push commands through. Think of these shells as having their standard input attached to the Agent listening for commands, and their standard output redirected to the Agent as well. It's more efficient to leave them open and sleeping than it is to create and destroy thousands of "sh" processes over time.
Plus you can see they spend most of their lives sleeping, so the performace impact overall is a tiny bt of RAM and almost zero CPU.
If you're triggering the >15 alarm on Agents with lots of modules loaded then it's OK... because you're actually doing a lot of work. Tune the threshold up a bit and ignore it.
Regards,
Mike.Kirk@HalcyonInc.com
http://www.HalcyonInc.com