msg submit latency
I have a JES 6.2-7.05 msg instance running that is set to find it's U/G info on a separate host. All is working fine, though I am seeing intermittent
latency when a user submits an email eg imap client either via port 25 or a port 587 submission. When there is latency there can be a delay of up to
5-10 seconds for the client to submit the mail to the MTA. I have ruled out the network between the client > server as both are isolated to the same switch
I also see the behavior on a variety of imap clients, I have disabled SSL as well as a spamassassin trigger set on the tcp_local channel and no AV is set for the MTA either yet the behavior persists.
Is it possible that the latency is due to directory lookup between the msg-instance's host and the DS host ie the SMTP AUTH negotiation ?
Are their some simple tests I can run to isolate this ? Once again due to the problem being intermittent it has been difficult to pinpoint the source.
[985 byte] By [
goubeauxa] at [2007-11-26 22:21:09]

# 1
Hi,
> I have a JES 6.2-7.05 msg instance running that is
> set to find it's U/G info on a separate host. All is
> working fine, though I am seeing intermittent
> latency when a user submits an email eg imap client
> either via port 25 or a port 587 submission.
A common cause of such delays is RBL lookups and/or DNS lookups in general especially in the PORT_ACCESS mapping table. Are you using such lookups? Are you using a DNS caching name-server on the server?
> When
> there is latency there can be a delay of up to
> 5-10 seconds for the client to submit the mail to the
> MTA.
Do you have LOG_CONNECTION=3 enabled in option.dat? This will be useful to determine when the client connected, and then you can see how long between the connection and each rcpt_to: reply. This may help to work out if there are any particular email addresses that are hanging etc.
> I have ruled out the network between the client
> > server as both are isolated to the same switch
That doesn't necessarily rule out the network... It just reduces the chance that this is the problem. I had a server for example that had a dodgy network cable, so every so often it would start dropping packets and run slow - was a pain to track down.
> I also see the behavior on a variety of imap clients,
> I have disabled SSL as well as a spamassassin trigger
> set on the tcp_local channel and no AV is set for the
> MTA either yet the behavior persists.
I wouldn't expect the spamassassin plugin to be the problem -- this occurs after the email has been received from the client.
> Is it possible that the latency is due to directory
> lookup between the msg-instance's host and the DS
> host ie the SMTP AUTH negotiation ?
Very possible. Have a look at the access logs, see if you have any several-second etimes.
> Are their some simple tests I can run to isolate this
> ? Once again due to the problem being intermittent it
> has been difficult to pinpoint the source.
Being intermittent, it is very difficult. As I noted though, increasing connection logging may help you to isolate if there is a particular user who is most affected, or if a particular destination address is causing the delay, or it if happens at a particular time of day etc.
Regards,
Shane.
# 2
Shane thanks for the ideas, I have checked on several items you mention.
>
> A common cause of such delays is RBL lookups and/or
> DNS lookups in general especially in the PORT_ACCESS
> mapping table. Are you using such lookups? Are you
> using a DNS caching name-server on the server?
>
> > When
> > there is latency there can be a delay of up to
> > 5-10 seconds for the client to submit the mail to
> the
> > MTA.
I am doing all my RBL/Network spam checks via SA
My port_access entry reads accordingly and is the default intsall's entry:
*|*|*|*|* $C$|INTERNAL_IP;$3|$Y$E
* $YEXTERNAL
How can I confirm if a DNS caching name-server is set for the instance? I did not deviate from defaults
that I know of for this.
>
> That doesn't necessarily rule out the network... It
> just reduces the chance that this is the problem. I
> had a server for example that had a dodgy network
> cable, so every so often it would start dropping
> packets and run slow - was a pain to track down.
I will double check the network connections.
> > Is it possible that the latency is due to
> directory
> > lookup between the msg-instance's host and the DS
> > host ie the SMTP AUTH negotiation ?
>
> Very possible. Have a look at the access logs, see if
> you have any several-second etimes.
>
I will look into this as it looks like symptoms point to this as a likely
area of contention.
# 3
Hi,
> How can I confirm if a DNS caching name-server is set
> for the instance? I did not deviate from defaults
> that I know of for this.
A DNS caching name-server is software you install on the server itself and then you point your resolv.conf to localhost; it is separate from messaging server. For example bind can be configured in caching mode only, search google for "dns caching nameserver"
> > That doesn't necessarily rule out the network...
> It
> > just reduces the chance that this is the problem.
> I
> > had a server for example that had a dodgy network
> > cable, so every so often it would start dropping
> > packets and run slow - was a pain to track down.
>
>
> I will double check the network connections.
I wasn't necessarily saying this was your problem, just that there are other ways that the network can be a factor apart from being a routing issue. If you want to check for packet/network issues, try the command:
netstat -in
Ideally you should have 0 in the 0errs/Ierrs/Collis columns e.g.
bash-3.00# netstat -in
Name Mtu Net/DestAddressIpkts Ierrs Opkts Oerrs Collis Queue
lo08232 127.0.0.0127.0.0.11968554 01968554 000
e1000g0 1500 1.2.3.4 1.2.3.8 10929192 0958761 000
> > > Is it possible that the latency is due to
> > directory
> > > lookup between the msg-instance's host and the
> DS
> > > host ie the SMTP AUTH negotiation ?
> >
> > Very possible. Have a look at the access logs, see
> if
> > you have any several-second etimes.
> >
>
>
> I will look into this as it looks like symptoms point
> to this as a likely
> area of contention.
Always a good place to start given that the directory response time is critical to the speed of email processing.
Regards,
Shane.
# 4
> > > Very possible. Have a look at the access logs,
> see if you have any several-second etimes.
Ok, so over a 12 hr period I am seeing a handful of etimes of 3 and greater, where the majority are empty or 1's If these times correspond to lookups done from the remote mailserv instance (hitting that directory instance) does this then clearly indicate that send latency could in fact be due to a lookup delay?
Are there then some tunning parameters that I might employ to alleviate this? Eg is this a directory resource issue? I am wondering also if you see this issue very often with mailserv instances that use an external directory for lookups?
bash-2.03# cat access | grep etime=\[3-5\]
[23/Mar/2007:01:06:05 -0700] conn=346437 op=98 msgId=101 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:02:47:44 -0700] conn=346771 op=831 msgId=836 - RESULT err=0 tag=101 nentries=72 etime=4
[23/Mar/2007:03:29:14 -0700] conn=346883 op=1077 msgId=1086 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:03:31:20 -0700] conn=346883 op=1655 msgId=1668 - RESULT err=0 tag=101 nentries=72 etime=4
[23/Mar/2007:03:33:58 -0700] conn=346883 op=1893 msgId=1908 - RESULT err=0 tag=101 nentries=72 etime=4
[23/Mar/2007:03:34:03 -0700] conn=346883 op=1968 msgId=1983 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:03:34:43 -0700] conn=346883 op=2234 msgId=2251 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:04:10:09 -0700] conn=347019 op=1307 msgId=1320 - RESULT err=0 tag=101 nentries=72 etime=4
[23/Mar/2007:04:10:14 -0700] conn=347019 op=1380 msgId=1393 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:04:46:06 -0700] conn=347267 op=160 msgId=161 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:06:45:08 -0700] conn=347682 op=1 msgId=2 - RESULT err=0 tag=101 nentries=0 etime=4
[23/Mar/2007:07:00:07 -0700] conn=347710 op=1010 msgId=1019 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:08:56:03 -0700] conn=348808 op=622 msgId=629 - RESULT err=0 tag=101 nentries=29 etime=4
[23/Mar/2007:08:56:38 -0700] conn=348808 op=1152 msgId=1159 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:09:41:36 -0700] conn=351021 op=537 msgId=542 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:09:53:04 -0700] conn=351021 op=1316 msgId=1337 - RESULT err=0 tag=101 nentries=80 etime=3
[23/Mar/2007:09:56:29 -0700] conn=351021 op=1982 msgId=2009 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:09:57:24 -0700] conn=351021 op=2221 msgId=2250 - RESULT err=0 tag=101 nentries=71 etime=5
[23/Mar/2007:09:57:29 -0700] conn=351021 op=2279 msgId=2308 - RESULT err=0 tag=101 nentries=72 etime=4
[23/Mar/2007:09:57:33 -0700] conn=351021 op=2338 msgId=2367 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:09:57:56 -0700] conn=351021 op=2466 msgId=2497 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:09:58:02 -0700] conn=351021 op=2585 msgId=2616 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:10:27:35 -0700] conn=42 op=22088 msgId=22089 - RESULT err=0 tag=101 nentries=90 etime=3
[23/Mar/2007:10:37:24 -0700] conn=353123 op=98 msgId=99 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:10:47:06 -0700] conn=353123 op=1815 msgId=1828 - RESULT err=0 tag=101 nentries=71 etime=4
[23/Mar/2007:11:18:19 -0700] conn=354170 op=114 msgId=115 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:11:19:17 -0700] conn=354170 op=908 msgId=913 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:11:19:21 -0700] conn=354170 op=968 msgId=973 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:11:38:03 -0700] conn=355359 op=500 msgId=501 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:11:40:45 -0700] conn=355359 op=719 msgId=724 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:11:46:00 -0700] conn=355359 op=910 msgId=919 - RESULT err=0 tag=101 nentries=31 etime=5
[23/Mar/2007:11:48:23 -0700] conn=42 op=22265 msgId=22266 - RESULT err=0 tag=101 nentries=492 etime=3
[23/Mar/2007:11:49:41 -0700] conn=355359 op=1062 msgId=1073 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:11:50:24 -0700] conn=355359 op=1351 msgId=1366 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:11:51:23 -0700] conn=355359 op=2077 msgId=2098 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:11:52:35 -0700] conn=355359 op=2217 msgId=2240 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:12:08:06 -0700] conn=356392 op=752 msgId=757 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:12:09:01 -0700] conn=356392 op=1047 msgId=1054 - RESULT err=0 tag=101 nentries=72 etime=3
[23/Mar/2007:13:15:55 -0700] conn=358466 op=125 msgId=126 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:13:16:04 -0700] conn=358466 op=319 msgId=320 - RESULT err=0 tag=101 nentries=71 etime=3
[23/Mar/2007:13:31:38 -0700] conn=42 op=22468 msgId=22469 - RESULT err=0 tag=101 nentries=40 etime=3
bash-2.03#
# 5
These all appear to be fetches of config data, and all return 70 or a little more entries. Not likely the problem.
However, I do believe Shane may be incorrect in how SpamAssassin works with Messaging Server.
I believe we send the message over to SA WHILE we're receiving it. If SA is slow, then we're going to be slow receiving the message, too.
Can you test sending a message via direct telnet to the port? If so, does it take a long time to get a banner? When exactly does it seem slow?
Have you had a look at the tuning guide at:
http://ims.balius.com/resources/downloads/files/iMS-Tuning-Guide.21.pdf
# 6
Thanks Jay,
In the case of a SA trigger being the cause for the latency I am wondering how SA configured to check mail on the ims_ms channel could be involved with mail submitted by a client, especially if the mail is destined for a non local user. My understanding was that the ims_ms channel wa acting on mail destined for the mailstore for a local user.
! ims-ms
ims-ms defragment subdirs 20 notices 1 7 14 21 28 backoff "pt5m" "pt10m" "pt30m" "pt1h" "pt2h" "pt4h" maxjobs 2 pool IMS
_POOL fileinto $U+$S@$D destinationspamfilter1optin spam
ims-ms-daemon
!
# 7
If you don't have a sourcespamfilter option set on tcp_local, then you've eliminated SA as a cause. I encourage you to check the tuning guide, and to test via direct telnet, to see where the delay is.