Sun Enterprise E4500/E5500 Crash
Can someone shed some light on what caused the system to panic?
This is a database server running Oracle 8i.
I have gone over the Oracle Alert and Trace logs and found it was 4:49:40 pm when Oracle registered an error in trying to write out to disk. No other Oracle errors were recorded leading up to that time, and none were reported from last night's monitoring log.
At 4:49 pm Oracle's Database Write Process was terminated abnormally due to an event(s) about which we will hopefully learn something soon. When Oracle's background Process Monitor registered that a major error had occurred in the DB Write Process, it terminated the Oracle Instance; and all of the Oracle background processes loaded into memory were killed as a necessary step to save database integrity and allow the Database to recover when it was next restarted. It was able to do this before the Server actually went down, but just barely because the trace file and alert log both end abruptly at this point. Oracle did not produce a core file, something it might have done had something gone terribly wrong within Oracle itself.
And I also noticed the system did not create a core file when it came back up. I don't have any type of core information to send to sun.
Thanks
Steve
- LOG FILES:
System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise E4500/E5500
System clock frequency: 100 MHz
Memory size: 8192Mb
========================= CPUs =========================
Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
- ---
0 0 0 400 8.0 US-II 10.0
0 1 1 400 8.0 US-II 10.0
2 4 0 400 8.0 US-II 10.0
2 5 1 400 8.0 US-II 10.0
4 8 0 400 8.0 US-II 10.0
4 9 1 400 8.0 US-II 10.0
6 12 0 400 8.0 US-II 10.0
6 13 1 400 8.0 US-II 10.0
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 827727 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU13 Dat
a access at TL=0, errID 0x00074c3c.601593d1
Jan 7 16:49:38 dbprod02 AFSR 0x00000000.00200000<UE> AFAR 0x00000001.c90127f8
Jan 7 16:49:38 dbprod02 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x100832760
Jan 7 16:49:38 dbprod02 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203<UE> UDBL.ESYND 0x03
Jan 7 16:49:38 dbprod02 UDBL Syndrome 0x3 Memory Module Board 6 J3101 J3201 J3301 J3401 J3501 J3601 J3701 J3801
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 851579 kern.warning] WARNING: [AFT1] errID 0x00074c3c.601593d1 Syndrome 0x3
indicates that this may not be a memory module problem
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 304836 kern.info] [AFT2] errID 0x00074c3c.601593d1 PA=0x00000001.c90127f8
Jan 7 16:49:38 dbprod02 E$tag 0x00000000.0c403920 E$State: Shared E$parity 0x06
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.00000020
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x38): 0x00000000.00000000 *Bad* PSYND=0x00
ff
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 953641 kern.warning] WARNING: [AFT1] AFAR was derived from UE report, CP eve
nt on CPU0 (caused Data access error on CPU13), errID 0x00074c3c.601593d1
Jan 7 16:49:38 dbprod02 AFSR 0x00000000.01000010<CP> AFAR 0x00000001.c90127f8
Jan 7 16:49:38 dbprod02 AFSR.PSYND 0x0010(Score 95) AFSR.ETS 0x00
Jan 7 16:49:38 dbprod02 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 304836 kern.info] [AFT2] errID 0x00074c3c.601593d1 PA=0x00000001.c90127f8
Jan 7 16:49:38 dbprod02 E$tag 0x00000000.0c403920 E$State: Shared E$parity 0x06
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.00000020
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.00000000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x38): 0x00000000.00000000 *Bad* PSYND=0x00
10
Jan 7 16:49:38 dbprod02 unix: [ID 321153 kern.notice] NOTICE: Scheduling clearing of error on page 0x00000001.c9012000
Jan 7 16:49:38 dbprod02 SUNW,UltraSPARC-II: [ID 929370 kern.info] [AFT3] errID 0x00074c3c.601593d1 Above Error is in User Mo
de
Jan 7 16:49:38 dbprod02 and is fatal: will reboot
Jan 7 16:49:38 dbprod02 unix: [ID 855177 kern.warning] WARNING: [AFT1] initiating reboot due to above error in pid 913 (orac
le)

