Dispatcher process failure

Hi,

Our dispatcher process failed last night, and it never got restarted by watch feature. Here are the entries I got from default log:

[24/Oct/2006:00:56:42 -0700] lithium msprobe[26856]: General Warning: alarmid=serverresponse|instance=smtp|time

=24/Oct/2006:00:56:42 -0700|value=100|low=0|high=100|threshold(over)=10|count over threshold=14|warning sent=8

[24/Oct/2006:00:58:19 -0700] lithium msprobe[26856]: General Warning: SMTP server took over 25 seconds to respo

nd!

[24/Oct/2006:00:58:19 -0700] lithium msprobe[26856]: General Warning: SMTP slowness may be a symptom of DNS pro

blems -- configuring to avoid DNS lookups on incoming connections may improve performance

[24/Oct/2006:01:07:32 -0700] lithium msprobe[27235]: General Error: function=getserverresponse|port=25|error=fa

iled to get server banner

[24/Oct/2006:01:07:32 -0700] lithium msprobe[27235]: General Critical: SMTP server is not responding

[24/Oct/2006:01:07:32 -0700] lithium msprobe[27235]: General Critical: dispatcher restart requested

[24/Oct/2006:01:07:32 -0700] lithium msprobe[27235]: General Error: function=getserverresponse|port=587|error=f

ailed to connect

[24/Oct/2006:01:07:32 -0700] lithium msprobe[27235]: General Critical: SMTP_SUBMIT server is not responding

[24/Oct/2006:01:17:32 -0700] lithium msprobe[27622]: General Error: function=getserverresponse|port=25|error=fa

iled to get server banner

[24/Oct/2006:01:17:32 -0700] lithium msprobe[27622]: General Critical: SMTP server is not responding

[24/Oct/2006:01:17:32 -0700] lithium msprobe[27622]: General Critical: dispatcher restart requested

[24/Oct/2006:01:17:32 -0700] lithium msprobe[27622]: General Error: function=getserverresponse|port=587|error=f

ailed to connect

[24/Oct/2006:01:17:32 -0700] lithium msprobe[27622]: General Critical: SMTP_SUBMIT server is not responding

Those patten kept going until I stop/start the services using the rc scripts this morning.

Any idea why it couldn't get restarted by itself?

Thanks,

[2129 byte] By [rami] at [2007-11-26 11:01:03]
# 1

I also got the following from watcher log:

[10/24/06 01:07:33] Received request to restart: dispatcher

[10/24/06 01:07:33] Connecting to watcher ...

[10/24/06 01:07:33] dispatcher server is not running

[10/24/06 01:07:33] Starting dispatcher server .... 27239

[10/24/06 01:17:33] Received request to restart: dispatcher

[10/24/06 01:17:33] Connecting to watcher ...

[10/24/06 01:17:33] ERROR: dispatcher failed twice in 600 seconds, will not perform restart

watcher process 14441 started at Tue Oct 24 08:40:00 2006

rami at 2007-7-7 3:14:45 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 2
Hi,Have you enabled the watcher auto-restart?./configutil -o local.autorestartIt should be set to "yes" or "true" or "1".Regards,Shane.
shane_hjorth at 2007-7-7 3:14:45 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 3
Yes, it had already been done from the very first day:lithium# ./configutil -o local.autorestartyes
rami at 2007-7-7 3:14:45 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 4
Hi,I should have read a little closer. It appears that watcher attempted to restart the dispatcher process but it failed twice, therefore it gave up. Have you got any core files/any log files that may indicate why the dispatcher failed?Regards,Shane.
shane_hjorth at 2007-7-7 3:14:45 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 5
Hi,I could not find anything else; there was no core file this morning, and I went through the all logs related to messaging server including syslog, but found no other explanation/cause about the failure.
rami at 2007-7-7 3:14:45 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 6

Well, there have been isses where the watcher got confused, there have also been issues where the configuration was so wrong that the probe wasn't able to get a connection, even when the server was running fine.

It might help, if you gave us your version. It's always good to give the version at the start. Please run

imsimta version

and post the results.

A default configuration of the MTA is certainly a good place to start, but in a busy/large envionment, that default really isn't enough.

You should read and understand the tuning guide, available here:

http://ims.balius.com/resources/downloads/files/iMS-Tuning-Guide.21.pdf

jay_plesset at 2007-7-7 3:14:45 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...