Alarming File Scanning
We've had a minimal SMC configuration for a while, and I finally have time to get serious about it. But I'm having some trouble getting it to do what I need, so I could use some help. So far these are the issues that I haven't been able to resolve:
First, I can't find a clear event related to a host down event; did I miss it? I ended up configuring Telnet Synthetic Transactions for each of our monitored hosts. That doesn't scale well, you know? :-)
Next, is there any way to configure an alarm action to execute *every* time some File Scanning event happens? All it does for me is execute on the first event, and then note the others in the alarms tab. The best that I've come up with is to set up the caution, alert and critical alarm events to report the first 3 events. Of course then I can't define true caution or alert events...
Also, is there a way to pull any information from the (File Scanning) line that triggered the alarm? For example, if a disk error event happened, it would be nice to have the disk listed in the page that goes out. Raw info is fine, as I can parse it in my paging script.
Finally, related to above, are there any other alarm variables available? In docs I've seen %instance, %statusfmt and %statusstringfmt. I've poked around and found that %value is the value of the cell being watched. Any other variables? If there is a reference for this, please tell me! I'd be glad to RTFM.
Thanks
[1473 byte] By [
jdslew] at [2007-11-26 5:56:13]

# 1
Hi jdslew,
> First, I can't find a clear event related to a host down event; did I miss it?
> I ended up configuring Telnet Synthetic Transactions for each of our
> monitored hosts. That doesn't scale well, you know? :-)
SunMC will trigger a "Down" (black) event if the entire host goes down. It will trigger a Critical (red) event if just an Agent goes down but the box stays up. The timer on both is 5 mintutes. You'll see the alarm in the Console, or you can create an icon for the SunMC topology process (port 164) and look at its Alarms tab to see historical agent/host down alarms. You do not have to configure either of these yourself, they are automatic.
> Next, is there any way to configure an alarm action to execute *every*
> time some File Scanning event happens?
<snip>
> Also, is there a way to pull any information from the (File Scanning)
> line that triggered the alarm?
You can do both of these with the LogFileAlert module in the PlusPack addon:
http://www.halcyoninc.com/downloads/
> Finally, related to above, are there any other alarm variables available?
If you are paging or sending email with this information, then the largest number of variables can be found in EventAction (this is a link to the params available):
http://www.halcyoninc.com/products/FrameworkPlugIns/EventAction/help/HALEventAc tion-adding-h.html
EventAction gives you a central place to configure all you notification for every event from every Agent.
Note: I am an employee of Halcyon, but these are the only SunMC modules that will allow you to place logfile strings in your events, or perform centralized notification with a wide variety of variables.
Regards,
Aronek
# 2
hi aronekis there anyway to force SunMC to send an email when a system goes down? i am using SunMC 3.0 with just the base modules.regardsshiva
# 3
> is there anyway to force SunMC to send an email when a system goes
> down? i am using SunMC 3.0 with just the base modules.
Yes, there are a couple of ways, even with the free framework. From either Reporter or the regular Java Console you can set an Alarm Action to run a script (i.e. the default email.sh script) to send you an email. Or if you own EventAction you can configure it from there as well.
For example, login to the SunMC Console so you are looking at a Domain (i.e. "Default Domain"). You can right-click on the top of the Domain, or a Group (folder), or a host icon itself and choose "Alarm Action". From the new dialog box that opens you can run scripts when either:
a) a host goes down
b) a SunMC Agent stops responding
If you set the action higher up in the Domain (i.e. at the top of the Domain) then it will apply to any hosts lower in the hierarchy. Or you can simply set the email action for each host icon individually.
You'll also notice if you set an action for a single host you have the option to "opt-out" of Group (higher up in the hierarchy) actions. This would let you do things like set a global email action at the top of a Domain to employee "Bart" that gets triggered if any host goes down, but you can still set one special host to send email to "Lisa" instead and opt out of the Bart email action. An example of when you would do this is if you have a global email for all production hosts, then you opt-out less important boxes (i.e. Development or QA) so they don't send email if somebody takes them down (i.e., they get rebooted), or just send that email to somebody different so it doesn't bother your NOC staff.
Regards,
Aronek
Standard disclaimer: I am an employee of Halcyon (www.HalcyonInc.com)
# 4
thanks aronek.
i am now getting emails if a host goes down. but there are two problems i havent been able to resolve.
1) How soon and how frequently will SunMC send email after a system goes down? during my tests, i observed that i get emails for a system which is down at seemingly random intervals. can this be changed?
2 Acknowledge the alert in case a production system does go down. i dont get the Alarms tab for a system which is down.
Thanks for your help.
shiva
# 5
> 1) How soon and how frequently will SunMC send email after a system goes down?
The SunMC Server will check for hosts down every 5 minutes. If you are seeing varied respsone times it is most likely due to how close the host was to the next polling interval when it went down. Perhaps it takes 1 second to detect, or at worst if the box was just polled you'd have to wait another 4:59 before the email is sent. On average, you should expect 2.5 minute response times to hosts going down. It will only send the email once, unless the host is going up and down repeatedly. Repeated notification requires a product such as EventAction (http://www.halcyoninc.com/products/FrameworkPlugIns/EventAction/)
> can this be changed?
There is no documented way to change this interval.
> 2) Acknowledge the alert in case a production system does go down. i dont get the
> Alarms tab for a system which is down.
It cannot appear in the Alarms tab for the system because that tab is only for the events which the Agent itself detects (and by definition if the box is down, so is the Agent :) ). The host/Agent down detection is done by the topology ("esd - init topology -dir /var/opt/SUNWsymon -q") process on the Server. Open the Java Console, do Edit -> Create Object, then create a standard Agent object with the hostname of your SunMC Server, but change the port from "161" to "164". After creating the object, double-click on its icon like a normal Agent and you'll see all the host/Agent down alarms in the Alarms tab for that process.
Regards,
Aronek
Standard disclaimer: I am an employee of Halcyon (www.HalcyonInc.com)
# 6
thanks again! that worked too.
but somehow as soon as i get one thing to work, another breaks :)
its now the topology manager. looks like it is exceeding the critical virtual memory and is being shut down. i tried increasing the critical virtual memory size to even 300MB ( system has 4GB ) but it still shuts down after sometime. am not really sure how far i can go. the logs are not of much help.
thanks for your patience and help.
shiva
# 7
my bad. installing patch 12 for SMC PU4 GA solved the problem.