x2200 Networking Problems

Hello,

I run a computational cluster of a mix of different machines, with all the compute nodes netbooted off of the head node.

We were very pleased with our V20zs, so when time came to order more, we ordered x2200s.

However, we're seeing random network dropouts from the machines, where we cannot access the server nor the service processor. When we can get to the machines, we see the following errors in dmesg:

tg3: eth1: transmit timed out, resetting

tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2

tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2

tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2

tg3: eth1: Link is down.

tg3: eth1: Link is up at 1000 Mbps, full duplex.

tg3: eth1: Flow control is offfor TX and offfor RX.

The cluster is running Fedora Core 4, if that helps any.

[937 byte] By [peawee] at [2007-11-26 12:02:06]
# 1
Sounds like the tg3 driver in the Fedora distribution doesn't handle the x2200's NIC properly. You'll probably have to load an up to date driver.You might check the FedoraForum.org's support blogs: http://forums.fedoraforum.org/
truly64 at 2007-7-7 12:26:21 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 2

I've run into the same problem with my X2200 server, however I'm running Windows Server 2003 R2.

I have 1 ethernet cable plugged into port 1 on the server, giving access to both the RKVM and the server itself. Both share the same network jack.

After some time, about 12-24hrs, the server will suddenly stop responding to all network requests entirely. Attempting to ping/access the server does not work, and attempting to ping/access the service processor does not work etiher. It's as if power or network is unplugged on the server.

If this is a driver problem with Fedora, could this also be a driver problem for Windows?

--MP

Tigereye at 2007-7-7 12:26:21 > top of Java-index,Sun Hardware,Servers - General Discussion...