uninspectable processes on Solaris 10

All -

We are running Solaris 10, patch level Generic_118833-20, in a fairly complex system which includes MySQL 5.0.22 and has several other applications (Perl, Java) accessing the MySQL server. Recently, on a fairly high-powered Sun computer (4 to 8 cores, lots of memory), we've encountered fairly regular situations where one of these processes will become unkillable, and inspecting that process will become impossible; invoking "ps" or "prstat" on the unkillable process (or even trying to list /proc/<pid>, where <pid> is the unkillable process) will hang and also not respond to Cntrl-C; all you can do is kill the shell. In many of the cases, mysqld has been the process that hung; in other cases, it's been a process which connects to the MySQL server (but is not necessarily connected to the server at the time). My intuition is that this has to be a problem in Solaris 10 itself, since I've never seen userland code cause a process to become uninspectable; but obviously, I could be wrong about this. The folks who are supporting us at Sun are pushing for this to be a MySQL problem; I'm wondering whether anyone has ever seen anything like this, and whether it could possibly be a MySQL problem alone.

This problem appeared about two weeks after we originally configured the machine, and can reliably been reproduced after the machine runs for somewhere between a couple hours and one day.

Thanks in advance -

Sam Bayer

The MITRE Corporation

sam@mitre.org

[1526 byte] By [stearn_n_dranga] at [2007-11-26 19:41:33]
# 1

An off-forum email has prompted me to update this query.

In the end, the problem turned out to be a Solaris problem. Solaris 10 had a bug where bad memory pages were not being registered in the OS. This bug was present even in the fully-patched version of Solaris 10. It was only because my colleagues have an expensive service contract with Sun that we were able to identify the problem. After we got the hotfix from Sun, it turned out that a number of our memory chips were bad, and needed to be replaced.

Sam

stearn_n_dranga at 2007-7-9 22:22:51 > top of Java-index,Solaris Operating System,Solaris 10 Features...