Alarms not working, loading, displaying.
Hello,
Did any one get to the bottom of the Alarm not appearing under SunMC? I have active alarms which do not appear under the Alarms Tab. I've tried all suggestions on this Forum, reseeding, moving agent files out the way, checking the info tab.
One one system agent I get "Unable to get event management information fron agent. Agent was busy or down. Will default to local event manager" When I check the info tab for this agent the Event and Trap destinations are blank, but I dont know how to enter them. The domain-config.x file has the corrent details in it for the server.
So two issues: one appearing with the Unable to get.....message.
and the other is
no active alarms displayed, even when the info tab is displaying the correct information for Event & Trap destinations.
please help me. I'm in a real crunch.
[870 byte] By [
sunitup] at [2007-11-26 5:59:29]

# 1
> Did any one get to the bottom of the Alarm not
> appearing under SunMC? I have active alarms
> which do not appear under the Alarms Tab. I've tried all
> suggestions on this Forum, reseeding, moving agent
> files out the way, checking the info tab.
I think we've talked before :)
You've double checked all your Agent params, and they look OK. But you've historically had database problems... ...and your Alarms tab is nothing more than an query into Oracle (through the "local Event Manager" - which you also have errors for).
At this point, since your alarms don't appear anyways, you have little to lose reinitializing the database.
a) Make a backup of the database:
/opt/SUNWsymon/sbin/es-backup
If something goes wrong you can revert using es-restore. If you're really paranoid, make a backup tarball of /var/opt/SUNWsymon as well.
b) Make a backup of your topology: this will export your icon/domain setup to an XML file. After resetting the database you can import that XML file back in so you don't have to recreate all your icons by hand
Tools --> Export Topology
c) Finally, rerun setup on your server.. and when asked if you want to preserve your old info, say no. In about 10-15 minutes you'll have a shiny new (empty) SunMC Oracle database:
/opt/SUNWsymon/sbin/es-setup
Once everything is started up again, try to trigger a fresh alarm on an Agent and see if it shows up in your alarms tab. If not, then the database actualyl wasn't the problem and you may want to use the es-restore script to put things back as they were (in case another procedure lets you unlock the events).
Good luck!
Mike Kirk
# 2
Hi Mike,
I think we have talked :-) you're everywhere.
Thanks for the info on this. I'm not sure that DB is completely
broken. As some agent work ok, and I can see Alarms and inititate new alarms which appear on the Alarm Tab, but others I have problems seeing any Alarms. Are there any comparisions I can do between a working agent and a non-working agent, I've already compared the domain-config.x file and it looks the same on both work and non-working agents. I'm wondering if the DB had a problem why would some agents work ok and some not. Anyway I appreciate the help. Any view on this?
Thank you.
# 3
> Thanks for the info on this. I'm not sure that DB is
> completely broken. As some agent work ok, and I
> can see Alarms and inititate new alarms which
> appear on the Alarm Tab, but others I have
> problems seeing any Alarms.
Ah, OK, I thought none of your Alarms tabs worked.
> Are there any comparisions I can do between
> a working agent and a non-working agent, I've
> already compared the domain-config.x file and it
> looks the same on both work and non-working
> agents.
Well are the working and non-working Agents on different networks? (i.e. working ones are on 10.20.0.0/24 and non-working are on 192.168.0.0/24?). Can all your Agents resolve your Server hostname correctly to the same IP address (i.e.in domain-config.x the trap and event sections should both resolve to the same IP for your Server)
On your Server, what IP is in /var/opt/SUNWsymon/cfg/multiip.dat?
Have you tried simply deleting and recreating an icon for the problem Agents in your Console?
Alarms on your Agent are stored in /var/opt/SUNWsymon/log/agentStatus.log. If that log somehow got corrupted, then alarms would no longer reach your Server (so they never make it into the database, so the Alarms tab is empty). You could stop a problem Agent, move that file to a backup location, then restart the Agent (it will make a new good version).. then give it a couple minutes and trigger an alarm again.
Without actually having to dig into diagnostic output, I'm running out of things for you to check.
Regards,
Mike
# 4
Hi
Thanks for that info. I'm currently going through a process of erasing the database and re-installing the SunMC. Does any one know a way of restoring just the DB PRM information after a new DB has been created. I noticed that es-backup does a export of the DB. Does any one know how to connect to the Oracle DB itself, ie what the username/password to connect to the SunMC DB. I have a feeling I will need to just restore the PRM information, but the other things will not be too much of an issue, but there again I'm thinking maybe this was part of the DB that was corrupt and not allowing Alarms or other not normal things to occur, like black splats on the topology view.
I have yet to see if erasing the DB and re-installing the SunMC will work. I guess it should, as it's equivalent to installing from scratch. I'm not sure if the black splat will go away when doing a topology import.
As I was typing this I looked at the progress of the installation and found, es-inst/es-setup errors:
the file /opt/SUNWsymon/oracle/product/8.1.7/bin/svrmgrl does not exist, exiting........................
Need to forcefully kill the database processes, please wait...
Configuring the system for setup, please wait...
-sh: /smc/install/disk1/sbin: permission denied
execution of database make failed
see /tmp/SunMCDBLogFiles/make.log file for details
Could not finish requested task.
Running es-setup again:
Starting Sun Management Center database setup...
execution of svrmgrl using /tmp/SunMCDBLogFiles/db-start.sql fail
exiting........................
sqlerror code = 27121
sqlerrmsg full = ORA-27121: unable to determine size of shared memory segment
Database setup failed : db-start failed
Could not finish requested task.
Install logfile is : /var/opt/SUNWsymon/install/install_s248006.050307181720.28074
End of Installation
Exiting Sun Management Center installation.
I'm in for it now...
# 5
Hi,
I managed to get further with this. The previous issues were uid:gid and permission with the smc oracle user.
Now while running es-setup, I'm getting:
Starting Sun Management Center database setup...
FAILED to change passwords.
Could not finish requested task.
cheers,
# 6
Ok did a es-uninst -X and removed all the temp file from /tmp/sh*
removed the shared mem and sem segments. Ran another es-inst
and the DB was created ok, imported the topology and the Alarms still did not work. There are alams on agent which cannot be seen in the Alarms tab, so I guess it's not the DB. I've exhaused my ideas with this. I'ved even tried to re-install an agent
onwards....
# 7
> I'ved even tried to re-install an agent
Did you try the replacement/recreation of the agentStatus.log like I mentioned above? If not, and you didn't use the "-X" version of es-uninst on that Agent... then that file would have been left on disk, and may still be broken.
Regards,
Mike Kirk
# 8
Hi,
I tried to remove the agentStatus.log file and restart the agent but no change.
I also used the es-uninst -X when removing the agent and reinstalling it.
I'm getting this error: in the agent.log file.
data update error: .iso.org.dod.internet.private.enterprises.sun.prod.
sunsymon.agent.modules.perftool.timer.performance.position..................... ............
[0000002c 0040 ]warning Mar 09 15:31:08 agent data update error: {} {}..........................................
Any ideas?
Thank you!
# 9
to: Sunitup
I am having te same problem you descirbed when trying to run the es-inst - I get the oracle error -
sqlerror code = 27121
sqlerrmsg full = ORA-27121: unable to determine size of shared memory segment
what were the errors you found on the gid of smcorau? I checked the file perms in $ORACLE_HOME/oracle/bin and all look ok -