SMC 3.6 agent data update error v490 solaris 8

Hi!

I've problems installing the agent without reboot on some hosts under solaris 8 (5.8 Generic_117350-02 sun4u sparc SUNW,Sun-Fire-480R). The problem shows up with black splats on the smc server (solaris 9), but I can browse the hardware information ...

In other thread's it was suggested to check if /opt/SUNWsymon/bin/config*

runs fine ... it does on my hosts.

One failure I found was:

0000002b 00ac ]warning Nov 28 12:56:15 agent data update error: .iso.org.dod.internet.private.enterprises.sun.prod.sunsymon.agent.modules.hardware.Con

fig.Reader4uvh.picld.status.............................

[0000002c 0058 ]warning Nov 28 12:56:15 agent data update error:{stty: : Invalid argument}{}.........................................................

........................................................

on the clients agent.log file.

Has anyone an idea what causes this error or why it may be away after a reboot ... it would be very helpfull since rebooting these DB clusternodes isn't a 15 min task ...

thanks,

bernhard

PS: the agent was installed using the agent-update.bin thingy

[1280 byte] By [furtmueller] at [2007-11-26 6:01:27]
# 1

> One failure I found was:

> warning Nov 28 12:56:15 agent data update error:

> .iso.org.dod.internet.private.enterprises.sun.prod.sun

> symon.agent.modules.hardware.Config.

> Reader4uvh.picld.status

> warning Nov 28 12:56:15 agent data update error: {stty:

> Invalid argument}

This is just a guess: but SunMC relies on the platform picld (platform information control library) infrastructure to acquire hardware data. And a common source of black splats is when SunMC tries to acquire data and gets back something it doesn't expect (i.e. if SunMC asked for a voltage of a power supply and got back: "stty: Invalid argument" instead from picld).

I'd check and see if you're missing any important picl patches (as an example patch 111792-12 <-- this may not be the fix, and obviously you need to make patching decisions yourself).

Did you have an older version of SunMC on the clients before that worked properly? Or is this the first install of SunMC on those systems?

Regards,

Mike.Kirk@HalcyonInc.com

Aronek at 2007-7-6 13:23:42 > top of Java-index,Administration Tools,Sun Management Center...
# 2

thanks for you answer!

I've already installed some patches for picld and plugins including your suggested one ... but to be honest, I haven't rebooted the nodes like recommended ... it seems that this would fix this issue, like it did on other nodes.

It is the first install of SMC Agents on this nodes, I've also retried to reinstall after applying the patches and wiping them with es-uninst -X.

It looks like I have to schedule reboots for 50+ hosts :|

br,

bernhard

furtmueller at 2007-7-6 13:23:42 > top of Java-index,Administration Tools,Sun Management Center...
# 3

I did some further investigations on this issue ...

I found out that commenting out picld status in /opt/SUNWsymon/modules/cfg/Config-Reader4uvh-d.x

picld = { [ use templates.Config-Reader4uvh-models-d.picld ]

oid = 1

# status = {

#type= active

#refreshService = _services.sh

#refreshCommand = restart_piclD.sh

#refreshMode= sync

# refresh only *once* at startup

#refreshInterval = 0

# refresh within the first second

# note the initHoldoff value in system below

# accounts for this, since picl must be restarted

# before data is read

#initInterval= 1

# }

}

# END CR6324698

fixes the problem with black splats, but I don't know the sideeffects ... has anyone an idea what may go wrong with that change ?

thanks,

bernhard

furtmueller at 2007-7-6 13:23:42 > top of Java-index,Administration Tools,Sun Management Center...
# 4

Hi Bernhard,

So, since SunMC complains bitterly when it gets back data it doesn't expect, I'd go eyeball this script: it has to be under /opt/SUNWsymon someplace...

> #refreshCommand = restart_piclD.sh

I bet if you run it by hand it spits out an error message, or at least has a non-zero exit code.

I remember seeing some problems with picld restarts a few months ago, initially related to Solaris 10 starting and stopping services differently (because it used SMF and the /etc/init.d entries the script expected were gone)

If this box had the picld /etc/init.d entries moved or changed it would explain the script error, and the black splat. Probably 5 minutes debugging the script (and maybe submitting a bug report + fix back to Sun if it's a legit error) would let you uncomment Config-Reader4uvh-d.x.

Regards,

Mike.Kirk@HalcyonInc.com

Aronek at 2007-7-6 13:23:42 > top of Java-index,Administration Tools,Sun Management Center...
# 5

Hi Mike,

I've tried this before and double checked it now, no error and picld is also restarted as expected ...

root:/var/adm# ls -l /opt/SUNWsymon/modules/sbin/restart_piclD.sh

-r-xr-xr-x1 rootsys 2923 Sep 20 23:28 /opt/SUNWsymon/modules/sbin/restart_piclD.sh

root:/var/adm# ps -ef | grep picld

root 208091 0 13:06:39 ?0:29 /usr/lib/picl/picld

root 12211 10480 0 14:36:17 pts/40:00 grep picld

root:/var/adm# /opt/SUNWsymon/modules/sbin/restart_piclD.sh

0

root:/var/adm# echo $?

0

root:/var/adm# ps -ef | grep picld

root 15644 10480 0 14:38:23 pts/40:00 grep picld

root 126451 0 14:36:24 ?0:03 /usr/lib/picl/picld

root:/var/adm#

so it seem's you lost your bet, I wish you would have won :)

many thanks,

bernhard

furtmueller at 2007-7-6 13:23:42 > top of Java-index,Administration Tools,Sun Management Center...
# 6
Hi!*bummer* smashing head on desk ....The problem was "stty erase ^H" in ~/.kshrc ...commenting this, and restarting the agent solved this issue ...no black splats anymore bernhard
furtmueller at 2007-7-6 13:23:42 > top of Java-index,Administration Tools,Sun Management Center...
# 7
Yikes; Nice catch.Not to uncommon either. So this was roots ~/.kshrc? Curious. What in the script gets caught up on this? Is this just poor script implementation, or unavoidable altogether?iMac.
ian_macd at 2007-7-6 13:23:42 > top of Java-index,Administration Tools,Sun Management Center...
# 8

yes, it was ~/.kshrc and on a second node also /etc/kshrc

I didn't find out why this happens, because invoking ksh doesn't print any error ... and I don't know how this restart script is invoked by esd ... so I think that's an issue for SUN to investigate.

I consider opening a call at our support partner, but didn't do it yet.

Maybe we'll see changes on this script with the next patch :)

bernhard

furtmueller at 2007-7-6 13:23:42 > top of Java-index,Administration Tools,Sun Management Center...