Solaris 10 crash after numerous ARP warnings

We have been doing some repatching on switch that our Solaris host has been connected to. During that time we kept on receiving ARP kernel warnings.

There were literally hundreds of them. These warnings are fairly understandable and documented. Can they cause the box to reboot though? See messages below:

Jul 8 13:04:35 <hostname> ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '<MAC address>' trying to be our address 010.048.023.022!

Jul 8 13:04:35 <hostname> ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '<MAC address>' trying to be our address 010.048.023.022!

Jul 8 13:04:35 <hostname> ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '<MAC address>' trying to be our address 010.048.023.022!

Jul 8 13:04:35 <hostname> ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '<MAC address>' trying to be our address 010.048.023.022!

Jul 8 13:04:35 <hostname> ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '<MAC address>' trying to be our address 010.048.023.022!

Jul 8 13:05:34 <hostname> unix: [ID 836849 kern.notice]

Jul 8 13:05:34 <hostname> ^Mpanic[cpu1]/thread=2a100405d20:

Jul 8 13:05:34 <hostname> unix: [ID 879351 kern.notice] sync initiated

Jul 8 13:05:34 <hostname> unix: [ID 100000 kern.notice]

Jul 8 13:05:34 <hostname> unix: [ID 839527 kern.notice] sched:

Jul 8 13:05:34 <hostname> unix: [ID 520581 kern.notice] trap type = 0x3

Jul 8 13:05:34 <hostname> unix: [ID 101969 kern.notice] pid=0, pc=0x10001340, sp=0x2a100405091, tstate=0x4400001503, context=0x0

Jul 8 13:05:34 <hostname> unix: [ID 743441 kern.notice] g1-g7: 10473400, 0, 104735b0, 300034e1bf0, 1, 0, 2a100405d20

Jul 8 13:05:34 <hostname> unix: [ID 100000 kern.notice]

Jul 8 13:05:34 <hostname> genunix: [ID 723222 kern.notice] 00000000fedc7cc0 unix:sync_handler+150 (1041bc18, 10000000, 0, 30019936340, 0, 0)

Jul 8 13:05:34 <hostname> genunix: [ID 179002 kern.notice]%l0-3: 0000000000000035 0000000000000034 0000000000000000 0000030001c05810

Jul 8 13:05:34 <hostname>%l4-7: 0000000000000001 0000000000000bf0 0000000000000001 0000000000000000

Jul 8 13:05:34 <hostname> genunix: [ID 723222 kern.notice] 00000000fedc7da0 unix:vx_handler+8c (f0000000, 10418850, 10418748, fedb8bb8, f005b7a9, 0)

Jul 8 13:05:34 <hostname> genunix: [ID 179002 kern.notice]%l0-3: 00000000100293f4 00000300002b4000 0000000000000001 0000000000000000

Jul 8 13:05:34 <hostname>%l4-7: 000000000a304066 0000000000000000 0000030002a327f8 000002a1000077b0

Jul 8 13:05:34 <hostname> genunix: [ID 723222 kern.notice] 00000000fedc7e50 unix:callback_handler+20 (fedb8bb8, fff5e280, 0, 0, 0, 0)

Jul 8 13:05:34 <hostname> genunix: [ID 179002 kern.notice]%l0-3: 0000000000000016 00000000fedc7701 0000030002948000 0000030002948510

Jul 8 13:05:34 <hostname>%l4-7: 0000030002a32018 0000000000000000 0000000000000000 000002a100245510

Jul 8 13:05:34 <hostname> unix: [ID 100000 kern.notice]

Jul 8 13:05:34 <hostname> genunix: [ID 672855 kern.notice] syncing file systems...

Jul 8 13:05:34 <hostname> genunix: [ID 733762 kern.notice] 4

Jul 8 13:05:34 <hostname> genunix: [ID 904073 kern.notice] done

Jul 8 13:05:34 <hostname> genunix: [ID 353387 kern.notice] dumping to /dev/dsk/c1t0d0s1, offset 2147614720

Jul 8 13:05:34 <hostname> genunix: [ID 409368 kern.notice] ^M100% done: 90656 pages dumped, compression ratio 3.05,

Jul 8 13:05:34 <hostname> genunix: [ID 851671 kern.notice] dump succeeded

Jul 8 13:15:28 <hostname> genunix: [ID 540533 kern.notice] ^MSunOS Release 5.8 Version Generic_117350-08 64-bit

[3921 byte] By [Iznajar] at [2007-11-26 8:39:48]
# 1

Of course it could.

However you had a filesystem issue, not necessarily a networking issue.

Quoting from your excerpt:

Jul 8 13:05:34 <hostname> genunix: [ID 723222 kern.notice] 00000000fedc7da0 unix:vx_handler+8c (f0000000, 10418850, 10418748, fedb8bb8, f005b7a9, 0)

Note the "unix:vx_handler" in there ?

Your messages also had "unix:sync_handler" in there.

Any perception of a corrupt veritas file system,

particularly if it is on an essential filesystem, like /usr

can stop normal functionality.

The system reset itself to try to sort everything out.

Perhaps you might choose to not do maintenance

on the network infrastructure while live systems need it

to communicate with their network mounts.

rukbat at 2007-7-6 22:15:35 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 2
Thanks very much, that explains a lot.
Iznajar at 2007-7-6 22:15:35 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 3

That stack trace is as a result of the machine receiving an XIR interrupt ( trap 3)

and the prom callingback into solaris's prom handler for that forcing a crashdump.

This is used on machines like 440's to reset machines when thay are

slow to respond, I guess it might be possible that the large numbers of messages

made the machine fail to respond to the watchdog hearbeats from the service processor.

The action of the handler on some machines can be controlled by the eeprom setting

error-reset-recovery=sync

and the heartbeat can be seen to be operational by looking in /var/adm/messages*

for lines like

rmclomv: [ID 758372 kern.notice] Hardware watchdog enabled

tim

timuglow at 2007-7-6 22:15:35 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...