ypserv loops after patching

I applied the latest (Aug/25/06) patch set the other day. On my NIS master ypserv, after a time greater then 10 minutes, goes into a loop taking over one out of my processors.

4979 root100 5576K 2176K cpu395:00 24.99% ypserv

In syslog there is a warning on startup.

ypserv[4979]: [ID 783678 daemon.warning] /usr/lib/netsvc/yp/ypserv: no /var/yp/securenets file

But securenets has never been required.

I don't know much about dtrace but from what I have extracted from it is:

errors:

1ypserv stat2 No such file or directory

1ypservfcntl22 Invalid argument

1ypservgetmsg11 Resource temporarily unavailable

1ypservopen642 No such file or directory

1ypserv lwp_wait45 Deadlock condition.

The deadlock interest me but I don't know how to figure out what is causin g the deadlock.

I tried a dtruss. It is mainly doing:

36254 pollsys(0x63A50, 0x5, 0x0)= 1 0

36261 ioctl(0x5, 0x530F, 0xFFBFFA3C)= 1 0

36263 fstat(0x5, 0xFFBFF960, 0xFEEC2000)= 0 0

36268 lwp_sigmask(0x3, 0x0, 0x0)= 0xFFFF 0

36269 lwp_sigmask(0x3, 0x2000, 0x0)= 0xFFFF 0

Not a huge help. The only thing it tried top open while I was watching it

was:

open("/dev/ticotsord\0", 0x2, 0x0)= 9 0

If someone could point me to a few other things to try to figure out why ypserv is gone into training for "America's Got Talent" the infinate loop edition, I would love to hear it.

Part of the process of figuring this out has been reading documentation, and for the first time in a while I've been very disapointed about the lack of man pages on Solaris. Did I forget to install a package or are there a lower standard for documentation coming out of Sun? For example I go looking for the man page for lwp_sigmask. It doesn't exist. sigmask does, but that syscall only takes 1 argument, when as we can see above lwp_sigmask takes 3.

I could back out of the patches, but I haven't had to do that under Solaris in ten years of administration and I would rather not start now.

Thanks for the help.

[2112 byte] By [HangTen] at [2007-11-26 9:53:52]
# 1
Since you mention dtrace, I assume you are referring to a solaris 10 machine.Its possible you are hitting the "tcp_fusion" bug there was a recent advisory about.Try adding set ip_do_tcp_fusion=0 to /etc/system and rebooting.
robertcohen at 2007-7-7 1:11:27 > top of Java-index,General,Sun Networking Services and Protocols...
# 2
Thanks for the suggestion.I have tried it and ypserv is still getting suck some where.
HangTen at 2007-7-7 1:11:27 > top of Java-index,General,Sun Networking Services and Protocols...
# 3
Sorry, my bad.Thats set ip:do_tcp_fusion=0
robertcohen at 2007-7-7 1:11:27 > top of Java-index,General,Sun Networking Services and Protocols...
# 4
I applied this work around and after a week ypserv has not gone loopy.Thanks for the help!For those looking for this problem report it's here: http://sunsolve.sun.com/search/document.do?assetkey=1-26-102576-1
HangTen at 2007-7-7 1:11:27 > top of Java-index,General,Sun Networking Services and Protocols...