smtp servers not responding

Hi

Currently I have this error messages in default log

[05/Jun/2007:11:55:25 +0800] mx3 msprobe[26215]: General Error: function=getserverresponse|port=25|error=failed to connect

[05/Jun/2007:11:55:25 +0800] mx3 msprobe[26215]: General Critical: SMTP server is not responding

FYI,

Sun Java(tm) System Messaging Server 6.2-8.04 (built Feb 28 2007)

libimta.so 6.2-8.04 (built 19:28:07, Feb 28 2007)

SunOS mx3.tm.net.my 5.10 SunOS_Development sun4v sparc SUNW,Sun-Fire-T200

Do you guys know how to resolve this problem?

Thanks

Message was edited by:

haw_9368

[629 byte] By [haw_9368a] at [2007-11-27 6:24:55]
# 1

Hi,

Is the SMTP server responding i.e if you run telnet <hostname> 25, do you get a banner reply?

How about if you connect to port 587 instead?

Do you have some kind of PORT_ACCESS mapping table entry which uses a DNS blacklist or similar?

Has your dispatcher process core dumped?

Have you seen this issue before?

What does the following command tell you?

./imsimta dispatcher_stats_tty

Regards,

Shane.

shane_hjortha at 2007-7-12 17:44:13 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 2

Hi

1)I did not get banner reply for telnet 0 25, yet when I do a "telnet localhost 25", the session

gets hung.

2)Yes I have got a banner reply for telnet 0 587

3)In my mapping file, I can see this

PORT_ACCESS

*|*|*|*|* $C$|INTERNAL_IP;$3|$Y$E

*|*|*|*|* $C$|VIRUS_IP;$3|$Y$E

*|*|*|*|* $C$|SPAM_IP;$3|$Y$E

TCP|*|25|*|* $C$[/jes/SUNWmsgsr/lib/conn_throttle.so,throttle,$1,25]$N421$ Connection$ not$ accepted$ at$ this$ time$E

* $YEXTERNAL

4) I did not see any core files dump for dispatcher

5) Yes this problem happen before, but this problem seems like stop happening after we reboot server once a ago. Now it happen again.

6)There are long output,

[root|mx3.tm.net.my:/jes/SUNWmsgsr/sbin] ./imsimta dispatcher_stats_tty

SMTPFri 16:14:49--

Cur Conns=1586, Max Concurrent Conns=1840, Total Conns=2500576

Cur Procs=30, Max Concurrent Procs=30, Total Procs=469

Min Time=00:00:00, Avg Time=00:01:16, Max Time=17:16:57

=============================================================================

2388811:12:13--

Cur Conns=50, Max Concurrent Conns=50, Total Conns=2527

(220.226.195.199)13:22:38-- (in=0, out=0)

(59.95.194.20) 13:22:3813:22:38 (in=0, out=53)

(202.186.33.163)13:22:3813:22:38 (in=109, out=420)

(60.49.90.37)13:22:3813:22:38 (in=0, out=53)

(172.18.0.252) 13:22:3813:22:38 (in=0, out=53)

Message was edited by:

haw_9368

haw_9368a at 2007-7-12 17:44:13 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 3

Hi,

> Conns=1586, Max Concurrent Conns=1840, Total

> Conns=2500576

> Cur Procs=30, Max Concurrent Procs=30, Total

> Procs=469

Based on the above, it could be one of several things:

1. Your system is unable to keep up with the load e.g. more traffic then messaging server can handle - do you have any statistics to see whether this system is receiving more traffic then others in your email farm (I assume you have more then one MX system)?

You currently have 1586 active connections, which is a lot. You can increase the number that can be handled by modifying the dispatcher.cnf file settings.

Is the load high on the system?

What about I/O, does iostat show really busy disks?

2. Your system is not processing/shutting down/handing off connections properly.

Two things you can try to remedy this situation:

a. Apply the Solaris patch 119998-01 which fixes the Solaris TCP bug #6408242 OR fully patch your Solaris system to the latest patch-bundle.

Now we haven't been able to prove whether this patch helps or not but customers has reported improved dispatcher performance after _fully_ patching their Solaris 10 system - which should include this patch.

b. Get a fix for the dispatcher bug #6533417 - proc_adjust_priority called without holding lock

This is available as a point-patch for messaging server 6.2 (125813-03) but you will need to log a support case to get a copy.

This bug is also fixed in messaging server 6.3 (120228-20). Upgrading to 6.3 is more work of course as you need to check pre-requisites etc.

Regards,

Shane.

shane_hjortha at 2007-7-12 17:44:13 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 4

Hi,

1)iostat output

[root|mx3.tm.net.my:/logs] iostat -x

extended device statistics

devicer/sw/skr/skw/s wait actv svc_t %w %b

sd1 10.624.1 241.5 609.9 1.9 0.876.32 15

sd2 0.00.00.00.0 0.0 0.06.500

sd4 0.233.82.5 698.9 0.0 0.310.10 18

nfs1 0.00.00.00.0 0.0 0.00.000

2)May I know the suggested value to set in dispatcher.cnf? Yes we have for mx and all are under one load balancer. This error only can be seen in mx2 and mx3.

3) I found a link and follow the workaround, but still it cannot resolv the issue.

http://sunsolve.sun.com/search/document.do?assetkey=1-26-57684-1

4)Do you know why the telnet session got hung?

thanks

Message was edited by:

haw_9368

haw_9368a at 2007-7-12 17:44:13 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 5

Hi,

> 1)iostat output

> [root|mx3.tm.net.my:/logs] iostat -x

>extended device statistics

>w/skr/skw/s wait actv svc_t %w %b

The disks don't look particularly busy, of course you didn't supply any load (CPU) information.

> 2)May I know the suggested value to set in

> dispatcher.cnf? Yes we have for mx and all are under

> one load balancer. This error only can be seen in mx2

> and mx3.

I can't suggest values without knowing what they currently are. Although I suspect that increasing the values

Well I would ask yourself what is difference with mx2 and mx3 compared to your other servers?

-> Did you patch them all the same from the OS level - at the same time?

-> Are they patches the same at the messaging server level?

-> Are they configured the same at the OS level (e.g. networking settings) and the messaging server level (MTA configuration)?

-> Are they the same physical hardware?

-> Are they receiving the same amount of load (e.g. you are using round-robin load-balancing vs. least connections)?

-> Do they have the same software running on them?

-> Are they networked the same?

-> Are they in the same physical location?

etc. etc.

Differences between systems can be very subtle. I had one customer who was positive two systems were identical but were processing different amounts of traffic, turned out one system had UFS logging enabled and the other one didn't - made a huge difference in performance.

> 3) I found a link and follow the workaround, but

> still it cannot resolv the issue.

> http://sunsolve.sun.com/search/document.do?assetkey=1-

> 26-57684-1

This addresses just the alarm itself, not the underlying problem. It's a bit like taking the battery out of a smoke alarm, sure the beeping has stopped, but the smoke hasn't gone away.

> 4)Do you know why the telnet session got hung?

If the dispatcher already has its configured limit of connections, it will not respond to any new connections (return an SMTP banner) until existing connections free up. Chances are the telnet session would have eventually returned a banner, but it could have taken seconds or even minutes.

There are different connection pools/limits for the connection on port 25 compared to the connection on port 587, which is why telnet localhost 587 worked but telnet 25 didn't.

Regards,

Shane.

shane_hjortha at 2007-7-12 17:44:13 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...