OpenSSH 4.2p1, dropped connections in Solaris 8

All of a sudden some of our hosts are randomly dropping about 1 out of every 5 to 10 connection attempts via SSH. We are running Solaris 8, way down rev in kernel patches (Feb 2005 I'm embarassed to say), OpenSSH 4.2p1 using the Sunfreeware.com package.

What happens is shortly after forking off its child, the new sshd exits with a "Read from socket failed: Resource temporarily unavailable" (EACCESS) failed read call (on fd 4, the "main" connunication socket, I trussed it out). "Connection closed by xx.xx.xx.xx" appears on the client.

What's weird is that is started happening out of the blue on a variety of clients and servers, last Thursday at noon. Only old and slow servers are affected (many 450 mhz E220Rs, 1 or 2 E420Rs, no V100, V210, V440s). The only change we have made on the networks affected is that we upgraded the switch connecting the networks involved the previous Friday. There are no errors showing in netstat -i or on the switch.

I'm inclined to blame a bug in either OpenSSH or the Sunfreeware build since I haven't been able to find any recent descriptions of this problem anywhere. But I also wouldn't rule out a kernel patch fixing this.

If anyone has enountered this and found a workaround please post, thanks,

-w

[1288 byte] By [W_Sandersa] at [2007-11-26 17:02:12]
# 1

After the switch upgrade, did you verify that the switch port settings were the same as before?

Old servers come with hme interfaces, and hme interfaces have a hard time with switches set to autoneg.It was common practice to hardcode them, as well as the switch ports, to 100fd (or to whatever the switch port allowed).

The newer interfaces, like bge, have no issues with autoneg or at leat I have not found any issues with them - still, I hardcode them whenever possible :)

Codename47a at 2007-7-8 23:29:58 > top of Java-index,General,Talk to the Sysop...
# 2

On the contrary our HME interfaces have an excellent time with autonegotiated interfaces. All my hosts have been set to autonegatiate for years, and the only errors I've experienced are when switch administrators set ports to something other than autonegotiate.

That being said, I do recall that Cisco Catalysts had this problem way back when, like in 1999 or so. All our switches are fairly modern Alpines and Junipers.

To answer my original question, I just upgarded all SSHDs to the latest and greatest - the latest SSH packages from Sunfreeware for Solaris, and the latest RedHat RPMS for Fedora - and no more problems.

I think there were some issues with the "check_by_ssh" Nagios plugin I was using. I just checked that and rewrote the plugin as a simple Perl wrapper. No problems since.

wsandersa at 2007-7-8 23:29:58 > top of Java-index,General,Talk to the Sysop...
# 3
..comes to show you... I had issues with hme's autonegging 9 out of 10 times, and really have had no issues with bge's. Strange world this computer stuff...Glad you solved your issue.
Codename47a at 2007-7-8 23:29:58 > top of Java-index,General,Talk to the Sysop...