SpamAssassin / Amavisd tuning - TCP active open: Failed connect() errors
I started this post as we were experiencing a problem with occassional delayed mail delivery to our mail store. As I continued writing it, I think that I've resolved it. But I wanted to post this anyway so if I'm doing things right it might help someone else; or, if I'm doing something wrong someone can correct me. :) For reference, we have a quad Sun v440, and process about 500,000 messages a day.
We were seeing messages like this in our log files:
09-Nov-2006 08:28:09.37 tcp_scanQ 1 user1@example.com rfc822;user2@example.com @tcp_scan-daemon:user2@example.com TCP active open: Failed connect()Error: Connection timed out
example.com represents our local domain. The errors seemed to occur proportionally to the amount of incoming mail we received. That is, we saw more of them under load.
We are running SpamAssassin, Amavisd/clamav, and:
Sun Java(tm) System Messaging Server 6.2-5.01 (built Nov 22 2005)
libimta.so 6.2-5.01 (built 11:57:57, Nov 22 2005)
SunOS hostname 5.9 Generic_118558-11 sun4u sparc SUNW,Sun-Fire-V440
This line seems to say that the message is in the tcp_scan channel, and cannot send to 127.0.0.1:10024 (amavisd) because that port isn't listening.
My amavisd max_servers was set at 15. After reading http://www.ijs.si/software/amavisd/amavisd-new-magdeburg-20050519.pdf and making a wild guess, I increased it to 30. The errors were less frequent, but still occurred.
The output of 'sar -d' showed that my local disk was experiencing upwards of 80% utilization. I moved the amavisd temp directory and the spamassassin bayes db to a san volume, and that sped everything up dramatically. The connect errors went away, clamav avg time per message is at 4 seconds down from 10. I am now receiving errors from spamassassin that say:
Nov 9 10:27:27 hostname.example.com spamd[21601]: prefork: server reached --max-children setting, consider raising it
My --max-children is set to 25 right now, and the server is cpu bound under heavy load, so I see no reason to raise it.
So this is my setup. I am no longer having an immediate problem, but comments/questions are welcome.
Config files related to my setup are below.
-
Excerpts of imta.cnf:
! tcp_scan
[] $E$R${tcp_scan,$L}$U%[$L]@tcp_scan-daemon
!
! ims-ms
ims-ms defragment subdirs 20 notices 1 7 14 21 28 backoff "pt5m" "pt10m" "pt30m" "pt1h" "pt2h" "pt4h" maxjobs 2 pool IMS_POOL destinat
ionspamfilter1 fileinto $U+$S@$D
ims-ms-daemon
!
! tcp_local
tcp_local smtp mx single_sys remotehost inner switchchannel identnonenumeric subdirs 20 maxjobs 7 pool SMTP_POOL saslswitchchannel tcp
_auth maytlsserver maysaslserver missingrecipientpolicy 0 aliasdetourhost tcp_scan-daemon
tcp-daemon
!
! tcp_intranet
tcp_intranet smtp mx single_sys subdirs 20 dequeue_removeroute maxjobs 7 pool SMTP_POOL allowswitchchannel saslswitchchannel tcp_auth
maytlsserver maysaslserver missingrecipientpolicy 4 aliasdetourhost tcp_scan-daemon
tcp_intranet-daemon
!
! tcp_scan
tcp_scan smtp single_sys subdirs 5 notices 1 backoff "pt10m" "pt30m" "pt2h" "pt4h" dequeue_removeroute maxjobs 7 pool SMTP_POOL daemon
[127.0.0.1] port 10024
tcp_scan-daemon
-
option.dat:
SPAMFILTER1_LIBRARY=/opt/sunjes/SUNWmsgsr/lib/libspamass.so
SPAMFILTER1_CONFIG_FILE=/opt/sunjes/SUNWmsgsr/config/SpamAssassin
SPAMFILTER1_STRING_ACTION=data:,require ["addheader"]; addheader "Spam-test: $U"; require "fileinto"; fileinto "Junk";
SPAMFILTER1_OPTIONAL=1
-
dispatcher.cnf:
[SERVICE=SMTP-SCAN]
DEBUG=-1
PARAMETER=CHANNEL=tcp_scan
PORT=10025
IMAGE=IMTA_BIN:tcp_smtp_server
LOGFILE=IMTA_LOG:tcp_scan-server.log
STACKSIZE=2048000
INTERFACE_ADDRESS=127.0.0.1
-
SpamAssassin:
host=127.0.0.1
port=783
debug=0
mode=1
field=
verdict=Junk
USE_CHECK=0

