Any hints why this automatic reboot was triggered?

hi,

today i've experienced an automatic reboot on my E220r running Solaris 8. The OS is pretty much up to date with the last patches installed 5 weeks ago.

uname -a says

SunOS server 5.8 Generic_117350-23 sun4u sparc SUNW,Ultra-60

prtdiag -v says

System Configuration: Sun Microsystems sun4u (2 X UltraSPARC-II 450MHz)

System clock frequency: 112 MHz

Memory size: 2048 Megabytes

========================= CPUs =========================

RunEcacheCPUCPU

Brd CPUModuleMHzMBImpl.Mask

- ---

0004504.0US-II10.0

0224504.0US-II10.0

========================= IO Cards =========================

BusFreq

Brd Type MHzSlotName Model

- - - - --

0PCI33On-Board network-SUNW,hme

0PCI33On-Board scsi-glm/disk (block) Symbios,53C875

0PCI33On-Board scsi-glm/disk (block) Symbios,53C875

0PCI33pcib slot 2 IntraServer,Ultra2-scsi-ithp+ IntraServer,ITI62xx

0PCI33pcib slot 2 IntraServer,Ultra2-scsi-ithp+ IntraServer,ITI62xx

0PCI33- TSI,gfxp GFXP

0PCI66pcia slot 1 network-pci108e,2bad SUNW,pci-gem

No failures found in System

===========================

========================= HW Revisions =========================

ASIC Revisions:

PCI: pci Rev 4

PCI: pci Rev 4

Cheerio: ebus Rev 1

System PROM revisions:

-

OBP 3.23.1 1999/07/16 12:08POST 2.0.2 1998/10/19 10:46

After the reboot i've found the following lines in /var/adm/messages:

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 986306 kern.warning] WARNING: [AFT1] EDP event on CPU2 Data access at TL=0, errID 0x00048c85.64eb07f9

May 24 11:55:24 serverAFSR 0x00000000.00400100<EDP> AFAR 0x00000000.191791e0

May 24 11:55:24 serverAFSR.PSYND 0x0100(Score 95) AFSR.ETS 0x00 Fault_PC 0xfdea05d8

May 24 11:55:24 serverUDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 766630 kern.info] [AFT2] errID 0x00048c85.64eb07f9 PA=0x00000000.191791e0

May 24 11:55:24 serverE$tag 0x00000000.0fc00322 E$State: Modified E$parity0x07

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0xf7f7f9f7.f6530400

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000189.8fa798bf

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0xf7f7f7f7.f7f7f7f7

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x20): 0xf7f6f8f7.f8f8f710 *Bad* PSYND=0x0100

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0xd2f1eeec.f1eee8ee

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0xeeeeeeee.eeeeeeec

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 364520 kern.info] [AFT2] errID 0x00048c85.64eb07f9 AFAR was derived from E$Tag

May 24 11:55:24 server unix: [ID 321153 kern.notice] NOTICE: Scheduling clearing of error on page 0x00000000.19178000

May 24 11:55:24 server SUNW,UltraSPARC-II: [ID 767476 kern.info] [AFT3] errID 0x00048c85.64eb07f9 Above Error is in User Mode

May 24 11:55:24 serverand is fatal: will reboot

May 24 11:55:24 server unix: [ID 855177 kern.warning] WARNING: [AFT1] initiating reboot due to above error in pid 7785 (tcpif)

May 24 11:55:29 server unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x00000000.19178000 cleared

May 24 11:56:51 server srvsrv[408]: [ID 702911 daemon.error] srvutil stop all -g 0

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: java.io.IOException: Stream closed

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at sun.nio.cs.StreamEncoder.ensureOpen(StreamEncoder.java:38)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:129)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:146)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at java.io.OutputStreamWriter.write(OutputStreamWriter.java:204)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at java.io.Writer.write(Writer.java:126)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at com.canto.cumulus.helios.HeliosFileLogger.writeLine(HeliosFileLogger.java:147)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at com.canto.cumulus.helios.Logger.writeLine(Logger.java:125)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at com.canto.cumulus.helios.HeliosSynchronizer.writeLogLine(HeliosSynchronizer.jav a:1081)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at com.canto.cumulus.helios.HeliosWatcher.stopWatcher(HeliosWatcher.java:126)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at com.canto.cumulus.helios.SynchronizerOptions.stopWatchers(SynchronizerOptions.j ava:469)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at com.canto.cumulus.helios.HeliosSynchronizer.shutdown(HeliosSynchronizer.java:40 9)

May 24 11:56:52 server srvsrv[408]: [ID 702911 daemon.error] companion2: at com.canto.cumulus.helios.HeliosSynchronizer$1.run(HeliosSynchronizer.java:255)

May 24 11:56:59 server srvsrv[408]: [ID 702911 daemon.error] exiting

May 24 11:57:00 server syslogd: going down on signal 15

May 24 11:57:03 server samfs: [ID 110226 kern.notice] NOTICE: SAM-FS: Initiated unmount filesystem: samfs1, vers 2

May 24 11:57:03 server samfs: [ID 110226 kern.notice] NOTICE: SAM-FS: Initiated unmount filesystem: samfs3, vers 2

May 24 11:57:04 server samfs: [ID 110226 kern.notice] NOTICE: SAM-FS: Initiated unmount filesystem: samfs4, vers 2

May 24 11:57:05 server samfs: [ID 363137 kern.notice] NOTICE: SAM-FS: Completed unmount filesystem: samfs1, vers 2

May 24 11:57:08 server samfs: [ID 110226 kern.notice] NOTICE: SAM-FS: Initiated unmount filesystem: samfs2, vers 2

May 24 11:57:13 server samfs: [ID 363137 kern.notice] NOTICE: SAM-FS: Completed unmount filesystem: samfs3, vers 2

May 24 11:57:13 server samfs: [ID 363137 kern.notice] NOTICE: SAM-FS: Completed unmount filesystem: samfs4, vers 2

May 24 11:57:13 server samfs: [ID 363137 kern.notice] NOTICE: SAM-FS: Completed unmount filesystem: samfs2, vers 2

May 24 11:57:34 server genunix: [ID 672855 kern.notice] syncing file systems...

May 24 11:57:34 server genunix: [ID 904073 kern.notice] done

May 24 11:58:50 server genunix: [ID 540533 kern.notice] ^MSunOS Release 5.8 Version Generic_117350-23 64-bit

Since i've read quite a few articles on this forum i'm sure it's not a memory issue but it could possibly be an upcoming cpu fault.

[7330 byte] By [GVE] at [2007-11-25 23:36:25]
# 1
Looks like CPU fault. But check the temperature.I would suggest send the messages output to SUN have them verify?
balaji_iii at 2007-7-5 18:20:43 > top of Java-index,General,Talk to the Sysop...
# 2
(over)-temperature is not an issue. My E220R is running fine in an air conditioned room at 20 degrees celcius.
GVE at 2007-7-5 18:20:43 > top of Java-index,General,Talk to the Sysop...
# 3

It is an eCache error.

SPARCII's will get this error for time to time. There is no "known cure" for this, other than going to a SPARCIII chip.

FYI, Sun's policy is to not replace the CPU when this happens , unless it happens more than once... still, that is no guarantee it will not happen again.. it is an inherent flaw in SAPRCII's...

Codename47 at 2007-7-5 18:20:43 > top of Java-index,General,Talk to the Sysop...
# 4
just to clarify, I should've said Sun's best practices recommendation, not Sun's policy..
Codename47 at 2007-7-5 18:20:43 > top of Java-index,General,Talk to the Sysop...