question regarding "Agent Statistics Number of processes > 15"

Hi People.

I have a general question regarding an alarm that has had its threshold breached on a number of our servers. Below is some investigation.

The alarm is generated by Local Applications > Agent Statistics > Sun Management Center Total Child Process Statistics.

The default threshold is set to 15 which is out of the box. However some of our hosts have 30 of these processes.

Below is a sample from the console. Can someone please explain why SUNMC would need to spawn all these child processes. Not to mention why these processes do nothing by SLEEP, as my truss output indicates.

Is this some memory issue that has occurred with SUNMC, or something to do with SUNMC not being able to clean its sub processes.

Any help or clarification on this matter would be greatly appreciated.

Thanks in advance.

===================================

[MYHOST]/# ps -ef | grep symon

root 257091 1Mar 14 ?1649:39 esd - init agent -dir /var/opt/SUNWsymon -q

root 20332 20329 0 15:03:45 pts/20:00 grep symon

[MYHOST]/# ptree 25709

25709 esd - init agent -dir /var/opt/SUNWsymon -q

25905 sh

25904 sh

27063 esd - shell perftool-shell.tcl

27093 sh

27091 sh

29213 sh

29211 sh

18218 sh

18205 sh

18216 sh

18227 sh

[MYHOST]/# truss -p 18227

.read(0, 0x000394D8, 128)(sleeping...)

===================================

[1483 byte] By [katsal] at [2007-11-26 8:07:24]
# 1

Hi Katsal,

> The default threshold is set to 15 which is out of

> the box. However some of our hosts have 30 of these

> processes.

This default limit is rather low, and in my opinion should be set to around 50 out-of-the-box. As you load more modules this number will go up... which is normal... and the default limit of 15 doesn't take that into account.

> Below is a sample from the console. Can someone

> please explain why SUNMC would need to spawn all

> these child processes. Not to mention why these

> processes do nothing by SLEEP, as my truss output

> indicates.

>

> Is this some memory issue that has occurred with

> SUNMC, or something to do with SUNMC not being able

> to clean its sub processes.

This is normal: they're called "captive shells". Many SunMC modules run shell commands to acquire data. For performance reasons each module will keep a couple shells open to push commands through. Think of these shells as having their standard input attached to the Agent listening for commands, and their standard output redirected to the Agent as well. It's more efficient to leave them open and sleeping than it is to create and destroy thousands of "sh" processes over time.

Plus you can see they spend most of their lives sleeping, so the performace impact overall is a tiny bt of RAM and almost zero CPU.

If you're triggering the >15 alarm on Agents with lots of modules loaded then it's OK... because you're actually doing a lot of work. Tune the threshold up a bit and ignore it.

Regards,

Mike.Kirk@HalcyonInc.com

http://www.HalcyonInc.com

Aronek at 2007-7-6 20:46:53 > top of Java-index,Administration Tools,Sun Management Center...