ims_master process died
iPlanet Messaging Server 5.2 Patch 2 (built Jul 14 2004)
libimta.so 5.2 Patch 2 (built 19:30:12, Jul 14 2004)
SunOS apollo1 5.9 Generic_118558-39 sun4u sparc SUNW,Sun-Fire-V440
As indicated by the subject line, one of the mta ims_master process suddenly died (and quite magically too, as it seems). Normally there are two ims_master processes running concurrently by the Messaging Server. Users of the mail store was unable to retrieve their messages and their full mail client was clocking. Upon noticing that the process has died, we promptly restarted mail (./stop-msg, ./start-msg). And this is a production system.
The following is all I have in my default log file:
[10/Apr/2007:15:27:26 -0400] apollo1 stored[5136]: General Error: function=getserverhello|port=143|error=failed to get reply
[10/Apr/2007:15:27:26 -0400] apollo1 stored[5136]: General Warning: alarmid=serverresponse|instance=imap|time=10/Apr/2007:15:27:26 -0400|value=10|low=0|high=10|
threshold(over)=10|count over threshold=1|warning sent=0
[10/Apr/2007:15:37:36 -0400] apollo1 stored[5136]: General Error: function=getserverhello|port=143|error=failed to get reply
[10/Apr/2007:15:47:46 -0400] apollo1 stored[5136]: General Error: function=getserverhello|port=143|error=failed to get reply
[10/Apr/2007:15:53:48 -0400] apollo1 stored[5136]: Store Notice: checkpoint peruser started
[10/Apr/2007:15:57:56 -0400] apollo1 stored[5136]: General Error: function=getserverhello|port=143|error=failed to get reply
There are no core dump files that got generated.
Please advise.
Thank you.
[1647 byte] By [
kdavida] at [2007-11-27 0:45:37]

# 1
Hi,
> iPlanet Messaging Server 5.2 Patch 2 (built Jul 14
> 2004)
> libimta.so 5.2 Patch 2 (built 19:30:12, Jul 14 2004)
> SunOS apollo1 5.9 Generic_118558-39 sun4u sparc
> SUNW,Sun-Fire-V440
>
> As indicated by the subject line, one of the mta
> ims_master process suddenly died (and quite magically
> too, as it seems). Normally there are two ims_master
> processes running concurrently by the Messaging
> Server.
There don't need to be any ims_master processes running, they only run when there is new email to be delivered and for a given idle period after this time - I *assume* that you mean IMAP processes (based on your log entries/comments below).
> Users of the mail store was unable to
> retrieve their messages and their full mail client
> was clocking. Upon noticing that the process has
> died, we promptly restarted mail (./stop-msg,
> ./start-msg). And this is a production system.
That is the correct thing to do. If an IMAP/POP/MSHTTP process dies, you MUST restart the entire store to clear out any database locks left behind by the dead process.
> The following is all I have in my default log file:
>
> [10/Apr/2007:15:27:26 -0400] apollo1 stored[5136]:
> General Error:
> function=getserverhello|port=143|error=failed to get
> reply
> [10/Apr/2007:15:27:26 -0400] apollo1 stored[5136]:
> General Warning:
> alarmid=serverresponse|instance=imap|time=10/Apr/2007:
> 15:27:26 -0400|value=10|low=0|high=10|
> threshold(over)=10|count over threshold=1|warning
> sent=0
> [10/Apr/2007:15:37:36 -0400] apollo1 stored[5136]:
> General Error:
> function=getserverhello|port=143|error=failed to get
> reply
> [10/Apr/2007:15:47:46 -0400] apollo1 stored[5136]:
> General Error:
> function=getserverhello|port=143|error=failed to get
> reply
> [10/Apr/2007:15:53:48 -0400] apollo1 stored[5136]:
> Store Notice: checkpoint peruser started
> [10/Apr/2007:15:57:56 -0400] apollo1 stored[5136]:
> General Error:
> function=getserverhello|port=143|error=failed to get
> reply
>
> There are no core dump files that got generated.
Do you have your system configured to produce a core dump (coreadm)?
Also the core dump file could be located in any number of locations on your filesystem, have you tried running something like `find / -name core` (during a quiet time of course)
Without a stack trace of the core dump there isn't much we can do to help.
Shane.
# 2
Thank you for your timely response Shane.
Just so yo know, I did do a preliminary check for core dump files and the result yielded nothing. And the system is configured to produce core files when it's needed.
At any rate, however, there are normally 8 imapd processes running by the MTA. Even though the log entry shows that one (1) of the imap processes died, how come a listing of the MTA owned processes still showed that these eight (8) processes were up and running? My other question is, I'm wondering, what can cause the imapd process to magically die, as it did?
Looking forward to hear from you soon.
Many thanks in advance!
# 3
Hi,
> Just so yo know, I did do a preliminary check for
> core dump files and the result yielded nothing. And
> the system is configured to produce core files when
> it's needed.
Ok -- although I recommend you compare you existing settings with those in the following document:
http://docs.sun.com/app/docs/doc/819-5355/6n7eo3v6s?a=view
> At any rate, however, there are normally 8 imapd
> processes running by the MTA. Even though the log
> entry shows that one (1) of the imap processes died,
I must have missed this.. which log entry in particular says that a process died?
> how come a listing of the MTA owned processes still
> showed that these eight (8) processes were up and
> running?
Maybe a process didn't die. The default log information shows that the 143 could not be accessed. This doesn't mean the process died, just that the process(es)/messaging server may have hung/become unresponsive for some reason.
The following document goes through information to collect in the case of process hangs:
http://docs.sun.com/app/docs/doc/819-5355/6n7eo3v6n?a=view
You should also check the imap logs around the time of the hang/unavailability to see whether there was activity.
> My other question is, I'm wondering, what
> can cause the imapd process to magically die, as it
> did?
Need to confirm if this is the case first.
Shane.