panic[cpu11]/thread=2a100449cc0: pcisch-0: Fatal PCI bus error(s)

I got this panic when i do some testing on SUN cluster. Anybody could give me some suggestions?

Any comments are welcome.

Environment information:

SunOS arcsun48kh232ed1 5.10 Generic_118833-24 sun4u sparc SUNW,Sun-Fire

Sun Microsystems sun4u Sun Fire 4800

Solaris 10 (Sparc) ( patch latest 10_recommended_patch_cluster)

scate version: 1.4.1

sun cluster 3.1 u4 ( patch 120500-12)

SUN volume manager( with Solaris 10)

Crashed information:

bash-3.00# Dec 5 20:18:03 arcsun48kh232ed1 ip: WARNING: The <if>:ip*_forwarding ndd variables are obsolete and may be removed in a future release of Solaris. Use ifconfig(1M) to manipulate the forwarding status of an interface.

Dec 5 20:18:03 arcsun48kh232ed1 last message repeated 3 times

Dec 5 20:19:32 arcsun48kh232ed1 DEV_STRESS: dev_stress_start: INFO: dev_stress successfully started

Dec 5 20:19:33 arcsun48kh232ed1 DEV_STRESS: : INFO: dev_stress_monitor successfully started

Notifying cluster that this node is panicking

SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major

EVENT-TIME: 0x457644ee.0x63a71d (0x1a01a2d86e0)

PLATFORM: SUNW,Sun-Fire, CSN: -, HOSTNAME: arcsun48kh232ed1

SOURCE: SunOS, REV: 5.10 Generic_118833-24

DESC: Errors have been detected that require a reboot to ensure system

integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information.

AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry

IMPACT: The system will sync files, save a crash dump if needed, and reboot

REC-ACTION: Save the error summary below in case telemetry cannot be saved

ereport.io.pci.sserr ena=1a01a29b42002c01 detector=[ version=0 scheme="dev"

device-path="/ssm@0,0/pci@18,700000" ] pci-status=4280 pci-command=146 pci-pa=

0

ereport.io.pci.rserr ena=1a01a29b42002c01 detector=[ version=0 scheme="dev"

device-path="/ssm@0,0/pci@18,700000" ] pci-status=4280 pci-command=146 pci-pa=

0

ereport.io.pci.sserr ena=1a01a29b42002c01 detector=[ version=0 scheme="dev"

device-path="/ssm@0,0/pci@18,700000/pci@2" ] pci-status=4290 pci-command=147

ereport.io.pci.sec-sta ena=1a01a29b42002c01 detector=[ version=0 scheme="dev"

device-path="/ssm@0,0/pci@18,700000/pci@2" ] pci-sec-status=a80 pci-bdg-ctrl=

23

panic[cpu11]/thread=2a100449cc0: pcisch-0: Fatal PCI bus error(s)

000002a100471e70 pcisch:pbm_error_intr+158 (300001d9bc0, 134cc00, 30000bbfdc8, 30000bbfdc8, 0, 300001d9580)

%l0-3: 00000300001c95b8 0000000000000000 000000000196a800 000000000196a800

%l4-7: 0000000000000001 000000000196a800 0000030000205830 0000000000000000

000002a100471f50 unix:current_thread+170 (0, 300031e8a98, 0, ffffffffffffffff, 300001c95b8, 8)

%l0-3: 00000000010076e4 000002a100449021 000000000000000e 0000000000000633

%l4-7: ffffffffffffffff 00000300061793d8 000000000000000b 000002a1004498d0

000002a100449970 unix:disp_getwork+38 (30003b74000, 18ab7c8, 180c000, 180c000, 0, 0)

%l0-3: 00000300001c95b8 ffffffffffffffff 00000300001c9580 00000300001c9680

%l4-7: 00000300001c9680 0000000000000000 00000300001c9680 0000000000000000

000002a100449a20 unix:idle+d4 (182bc00, 0, 30003b74000, ffffffffffffffff, 4, 182ac00)

%l0-3: 00000300031e8a98 000000000000001b 0000000000000000 ffffffffffffffff

%l4-7: 00000300031e8a98 ffffffffffffffff 00000000018ab7c8 00000000010554bc

panic: entering debugger (continue to save dump)

Welcome to kmdb

kmdb: unable to determine terminal type: assuming `vt100'

Loaded modules: [ crypto cpc sd ptm ufs unix krtld s1394 sppp wrsmd ipc nca

genunix ip sgsbbc logindmux wrsm isp usba specfs ssd nfs md random sctp ]

[11]>

Message was edited by:

sky.ma

[3926 byte] By [sky.ma] at [2007-11-26 11:57:44]
# 1
Looks like a hardware error to me. This isn't a panic directly caused by Sun Cluster.Tim
TimRead at 2007-7-7 12:18:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 2

Agreed, seems like a PCI bus error.

Data flowing through SB4 didn't like something, so cpu11 barked out a complaint.

Perhaps it was something passing to or from your storage

or to and from the other partner pair.

Install Explorer, as per Public Infodoc 82329,

run it to gather a system configuration snapshot,

and then go open a support case with Sun.

At the very least, you'll find it was simply a result of a typo

in the modifications you were making to your configuration.

Best to let Sun Techsupport figure this out for you.

rukbat at 2007-7-7 12:18:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...