Local patch server still unreliable
Our local patch server runs on a Solaris 10 server that was fully patched only eleven
days ago. One one client, `smpatch analyze' worked correctly yesterday morning.
Today, it would hang for more than five minutes, unable to download even small
files like motd.xml from the local patch server.
When I checked on the local patch server server, I found over 20 processes like this:
root 2184810Jul 14 ?0:00 /usr/java/bin/java -Djava.library.path=/usr/lib/cc-ccr/lib -Djava.endorsed.dirs
These remained when I stopped the local patch server. `kill' didn't get rid of them, but
`kill -9' finally did. Once they were gone, restarting the local patch server allowed
smpatch to work correctly on the client again.
Can this product be made to work properly? Should I be restarting it once a day
and cleaning up leftover processes in the meantime?
# 1
To help determine the problem can you answer the following;
o Has this only happened the once?
o Were there any errors in the system logs or the UC Proxy logs under /var/patchsvr/logs?
o What version of the Update Connection Proxy are you running? Check patch 119788 for Sparc and 119789 for X86.
o What version of Java are you running?
# 2
> o Has this only happened the once?
No, it's happened before. I'd hoped that it would be fixed by recent patches.
> o Were there any errors in the system logs or the UC
> Proxy logs under /var/patchsvr/logs?
Here's one from catalina.out that seems relevant:
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# SIGBUS (0xa) at pc=0xff254508, pid=1355, tid=5695
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_12-b04 mixed mode)
# Problematic frame:
# C [libc.so.1+0x54508]
#
# An error report file with more information is saved as hs_err_pid1355.log
#
# If you would like to submit a bug report, please visit:
#http://java.sun.com/webapps/bugreport/crash.jsp
#
Catalina.start: LifecycleException: null.open: java.net.BindException: Address already in use:3816
> o What version of the Update Connection Proxy are you
> running? Check patch 119788 for Sparc and 119789 for
> X86.
It's 119788-08.
> o What version of Java are you running?
Is this the one in use?
/usr/java -> jdk/jdk1.5.0_12
# 3
The following error indicated that for some reason multiple instances of the proxy are attempting to launch:
java.net.BindException: Address already in use:3816
You can use the following commands to verify the version of Java in use:
# cacaoadm get-param java-home
# java -version
Obviously the first command will only work on later releases of Solaris 10 that have cacao installed. A new version of the patch 119788 / 119789 is due for release in the course of the next few weeks - it may be worth simply restarting the patch server process regularly until then.
# 4
The server runns Solaris 10 3/05, which seems not to have cacaoadm.
Here's what java says:
java version "1.5.0_12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
Java HotSpot(TM) Server VM (build 1.5.0_12-b04, mixed mode)
# 5
Hi
This looks normal and correct.
May we have a look at all the installed patches for the product using the commands below :
# showrev -p | cut -d" " -f2 | sort > /tmp/showrev-p
# egrep '11978[8|9]|12033[5|6]|12108[1|2]' /tmp/showrev-p
# egrep '12111[8|9]|12145[3|4]|12156[3|4]' /tmp/showrev-p
# egrep '12223[1|2]|12300[3|4|5|6]|124463|12461[4|5]' /tmp/showrev-p
Mod.
# 6
Here you go...
<mills@canopus:288>$ egrep '11978[8|9]|12033[5|6]|12108[1|2]' /tmp/showrev-p
119788-08
120335-04
121081-06
<mills@canopus:289>$ egrep '12111[8|9]|12145[3|4]|12156[3|4]' /tmp/showrev-p
121118-12
121453-02
121563-02
<mills@canopus:290>$ |5|6]|124463|12461[4|5]' /tmp/showrev-p<
122231-01
# 7
From the output provided everything looks okay.
In a previous posting you had stated that the following error was present in the "catalina.out" file:
null.open: java.net.BindException: Address already in use:3816
This is indicative of there being another program already using port "3816" however in this case it may have very well been the patchsvr process already using it.
Since killing all the patchsvr processes has multiple instances of the patchsvr process spawned?
# ps -ef | grep cc-ccr
Are you still seeing the above error in the "catalina.out" file?
# 8
Yes, I see six of them running now..
root 16944 246090 08:36:32 ?0:00 /usr/java/bin/java -Djava.library.path=/usr/lib/cc-ccr/lib -Djava.endorsed.dirs
root 16109 246090 08:28:37 ?0:00 /usr/java/bin/java -Djava.library.path=/usr/lib/cc-ccr/lib -Djava.endorsed.dirs
mills 6484 64830 07:54:02 ?0:00 grep cc-ccr
root 15482 246090 08:24:10 ?0:00 /usr/java/bin/java -Djava.library.path=/usr/lib/cc-ccr/lib -Djava.endorsed.dirs
root 2460910Jul 22 ? 22:11 /usr/java/bin/java -Djava.library.path=/usr/lib/cc-ccr/lib -Djava.endorsed.dirs
root 14590 246090 08:18:36 ?0:00 /usr/java/bin/java -Djava.library.path=/usr/lib/cc-ccr/lib -Djava.endorsed.dirs
root 15904 246090 08:27:18 ?0:00 /usr/java/bin/java -Djava.library.path=/usr/lib/cc-ccr/lib -Djava.endorsed.dirs
No, that error has not reappeared in catalina.out.
# 9
Could you try the following:
# patchsvr stop
# patchsvr disable
# ps -ef | grep cc-ccr | grep -v grep | awk '{print $2}' | xargs kill -9
# patchsvr enable
# patchsvr start
There should now only be one process running:
# ps -ef | grep cc-ccr | grep -v grep | awk '{print $2}' | wc -l
# 10
That's exactly what I did, except for the disable/enable, a few days ago.
There was only one running afterwards. The others appeared later,
presumably in response to client requests.
# 11
Hi
Would you please check the rpc is running on the server, to ensure hand-off for connections and enabling the server to continue listening on port 3816
Mod.
# 12
Yes, rpc is certainly running. What's the program number and version?
I don't see anything obvious in the `rpcinfo' output.
# 13
In order for us to debug this issue further, we will need access to logs. The forum is not the best means by which to analise logs.
As you have a SunUC proxy you will have a support contract . Please can you raise a support case via your local CCC and ask that it be transfer to the SWUP_SUPPORT group.
# 14
It's case numbers 65590837 and 11182320.