289R keeps crashing

Hello

Some help required please, 280R keeps crashing, stays up for about 3 minutes. used psradm to disable proc 1 first and still crashed, then disabled/offlined proc0 and crashed again. Its been working Ok for 2 years. Diag switch set to TRUE and diags pass. Details below.

There are panics for both cpu1 and cpu0 so not cpu1 all the time

panic[cpu1]/thread=2a100411d20: ufs_ifree: freeing free inode, mode:0,

ino:2799184, fs:/opt

000002a1004116b0 ufs:real_panic_v+70 (0, 104660a8, 2a100411950, 0,

1045d968, 3000395c660)

%l0-3: 000000001010c420 0000000000000000 0000000000000000

000002a1004117f8

%l4-7: 000000057a3da000 000000057a3da000 0000000000000000

0000000000000000

000002a100411760 ufs:ufs_fault_v+48 (30004d70f50, 104660a8, 2a100411950,

30004d70f50, 5b, 104660a8)

%l0-3: 00000300016b43b8 000003000395c5a0 0000005500000022

0000000000001000

%l4-7: 0000000000002000 0000030000332f78 0000030000332f78

0000000002bd1ec0

000002a100411810 ufs:ufs_fault+1c (30004d70f50, 104660a8, 0, 2ab650,

300021620d4, 300016b43b8)

%l0-3: 0000000000002f78 000000001045ac00 0000030001f58a58

000000057a3dc880

%l4-7: 0000000000000080 0000000000000080 000000057a3dc880

0000000081010100

000002a1004118c0 ufs:ufs_ifree+1bc (300016b4468, 90255, 3000395c5a0,

1f1, 2ab650, 30004d70ec0)

%l0-3: 0000000000003e2f 00000300016b43b8 0000030002162000

00000300027e4000

%l4-7: 0000000000000000 0000030000a29f28 0000000000000000

0000030004d70ec0

000002a100411970 ufs:ufs_delete+1e4 (30004d71000, 300016b4400, 3e2f, 0,

30004d70f50, 30004d70ec0)

%l0-3: 000000001034459c 00000300016b43b8 0000000000000001

0000000000002270

%l4-7: 00000300020d53b0 0000000000000000 0000000000000000

000002a10001f910

000002a100411a40 ufs:ufs_thread_delete+c4 (3000026b368, 0, 10423a50,

300016b43b8, 300016b4428, 0)

%l0-3: 00000300016b4408 0000030004d70ec0 0000000000000001

000002a10001fd20

%l4-7: 0000000000000000 000003000026bea8 0000000000000000

000002a10001f9c0

syncing file systems...WARNING: md: d24: write error on

/dev/dsk/c1t1d0s4

_

System Configuration: Sun Microsystems sun4u Sun Fire 280R (2 X

UltraSPARC-III)

System clock frequency: 150 MHz

Memory size: 2048 Megabytes

========================= CPUs

===============================================

RunE$CPUCPU

Brd CPU MHzMBImpl.Mask

- - - -

A07508.0 US-III5.4

B17508.0 US-III5.4

========================= Memory Configuration

===============================

Logical Logical Logical

MCBankBankBank DIMMInterleave

Interleaved

Brd IDnumsizeStatusSizeFactorwith

-----

--

CA001024MBno_status512MB2-way0

CA021024MBno_status512MB2-way0

========================= IO Cards =========================

Bus Max

IOPort BusFreq Bus Dev,

Brd Type ID Side Slot MHz Freq Func State Name

Model

- - - - - - - - --

-- -

I/OPCI8A13366 1,0 ok

pci-pci8086,b154.0/pci108e,1000PCI-BRIDGE

I/OPCI8A13366 0,0 ok

pci108e,1000-pci108e,1000.1device on pci-bridge

I/OPCI8A13366 0,1 okSUNW,qfe-pci108e,1001

SUNW,pci-qfe/pci-bridg+

I/OPCI8A13366 1,0 ok

pci108e,1000-pci108e,1000.1device on pci-bridge

I/OPCI8A13366 1,1 okSUNW,qfe-pci108e,1001

SUNW,pci-qfe/pci-bridg+

I/OPCI8A13366 2,0 ok

pci108e,1000-pci108e,1000.1device on pci-bridge

I/OPCI8A13366 2,1 okSUNW,qfe-pci108e,1001

SUNW,pci-qfe/pci-bridg+

I/OPCI8A13366 3,0 ok

pci108e,1000-pci108e,1000.1device on pci-bridge

I/OPCI8A13366 3,1 okSUNW,qfe-pci108e,1001

SUNW,pci-qfe/pci-bridg+

========================= Environmental Status =========================

System Temperatures (Celsius):

cpu01

55 54

=================================

Front Status Panel:

-

Keyswitch position: LOCKED

System LED Status: POWERGEN FAULT

[ ON][OFF]

=================================

Disk Status:

PresenceFault Value

----

DISK0: [PRESENT][NO_FAULT]

DISK1: [PRESENT][NO_FAULT]

=================================

Fan Bank :

-

BankStatus

--

FAN [NO_FAULT]

=================================

Power Supplies:

SupplyStatus PS Type

PS0[NO_FAULT][Sun-Fire-280R]

PS1[NO_FAULT][Sun-Fire-280R]

=================================

========================= HW Revisions

=======================================

System PROM revisions:

-

OBP 4.5.10 2002/02/11 10:39

IO ASIC revisions:

Port

ModelID Status Version

-- - -

Schizo8ok4>%l4-7: 000000057a3da000

CRASHED AGAIN HERE WHILE RUNNING PRTDIAG

000000057a3da000

0000000000000000

> 0000000000000000

> 000002a100411760 ufs:ufs_fault_v+48 (30004d70f50, 104660a8,

> 2a100411950, 30004d70f50, 5b, 104660a8)

>%l0-3: 00000300016b43b8 000003000395c5a0 0000005500000022

> 0000000000001000

>%l4-7: 0000000000002000 0000030000332f78 0000030000332f78

_

@(#)OBP 4.5.10 2002/02/11 10:39 Sun Fire 280R

BBC AID Register 0000.0000.0000.0000

Reset: 0000.0000.0000.8010SPOR PLL

Loading Configuration

Membase: 0000.0000.0000.0000

MemSize: 0000.0000.8000.0000

Clearing TLBs Done

Init CPU arrays Done

Init E$ tags Done

Setup TLB Done

MMUs ON

Block Scrubbing Done

Copy Done

PC = 0000.07ff.f008.5970

PC = 0000.0000.0000.59e8

Decompressing Done

Size = 0000.0000.0006.f8b0

ttya initialized

Start Reason: Soft Reset

System Reset: (SPOR) (PLL)

Probing gptwo at 0,0 SUNW,UltraSPARC-III (750 MHz @ 5:1, 8 MB)

memory-controller

Probing gptwo at 1,0 SUNW,UltraSPARC-III (750 MHz @ 5:1, 8 MB)

memory-controller

Probing gptwo at 8,0 pci pci

Loading Support Packages: kbd-translator

Loading onboard drivers: ebus flashprom bbc power i2c dimm-fru dimm-fru

dimm-fru dimm-fru nvram idprom i2c cpu-fru temperature cpu-fru

temperature fan-control motherboard-fru ioexp ioexp ioexp

fcal-backplane remote-system-console power-distribution-board

power-supply power-supply rscrtc beep rtc gpio pmc parallel

rsc-control rsc-console serial

CPU 0 set ambient power off temperature to 70 degrees C

CPU 0 set junction power off temperature to 110 degrees C

CPU 1 set ambient power off temperature to 70 degrees C

CPU 1 set junction power off temperature to 110 degrees C

Memory Configuration:

Segment @ Base:0 Size: 2048 MB ( 2-Way)

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000 Device 4 SUNW,qlc fp disk

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000 Device 1 pci

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 0 pci108e,1000 SUNW,qfe

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 1 pci108e,1000 SUNW,qfe

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 2 pci108e,1000 SUNW,qfe

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 3 pci108e,1000 SUNW,qfe

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 4 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 5 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 6 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 7 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 8 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 9 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device a Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device b Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device c Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device d Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device e Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device f Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 5 network usb

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 6 scsi disk tape scsi disk tape

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 1 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 2 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 3 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 4 Nothing there

screen not found.

keyboard not found.

Keyboard not present. Using ttya for input and output.

Start Reason: Soft Reset

System Reset: (SPOR) (PLL)

Probing gptwo at 0,0 SUNW,UltraSPARC-III (750 MHz @ 5:1, 8 MB)

memory-controller

Probing gptwo at 1,0 SUNW,UltraSPARC-III (750 MHz @ 5:1, 8 MB)

memory-controller

Probing gptwo at 8,0 pci pci

Loading Support Packages: kbd-translator

Loading onboard drivers: ebus flashprom bbc power i2c dimm-fru dimm-fru

dimm-fru dimm-fru nvram idprom i2c cpu-fru temperature cpu-fru

temperature fan-control motherboard-fru ioexp ioexp ioexp

fcal-backplane remote-system-console power-distribution-board

power-supply power-supply rscrtc beep rtc gpio pmc parallel

rsc-control rsc-console serial

CPU 0 set ambient power off temperature to 70 degrees C

CPU 0 set junction power off temperature to 110 degrees C

CPU 1 set ambient power off temperature to 70 degrees C

CPU 1 set junction power off temperature to 110 degrees C

Memory Configuration:

Segment @ Base:0 Size: 2048 MB ( 2-Way)

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000 Device 4 SUNW,qlc fp disk

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000 Device 1 pci

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 0 pci108e,1000 SUNW,qfe

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 1 pci108e,1000 SUNW,qfe

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 2 pci108e,1000 SUNW,qfe

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 3 pci108e,1000 SUNW,qfe

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 4 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 5 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 6 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 7 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 8 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device 9 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device a Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device b Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device c Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device d Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device e Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,600000/<a href="mailto:pci&#64;1" target="_blank">pci@1</a> Device f Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 5 network usb

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 6 scsi disk tape scsi disk tape

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 1 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 2 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 3 Nothing there

Probing /<a href="mailto:pci&#64;8" target="_blank">pci@8</a>,700000 Device 4 Nothing there

Sun Fire 280R (2 X UltraSPARC-III) , No Keyboard

Copyright 1998-2002 Sun Microsystems, Inc. All rights reserved.

OpenBoot 4.5, 2048 MB memory installed, Serial #51571441.

Ethernet address 0:3:ba:12:ea:f1, Host ID: 8312eaf1.

Rebooting with command: boot

Boot device: disk File and args:

Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.

FCode UFS Reader 1.11 97/07/10 16:19:15.

Loading: /platform/SUNW,Sun-Fire-280R/ufsboot

Loading: /platform/sun4u/ufsboot

SunOS Release 5.8 Version Generic_108528-14 64-bit

Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved.

WARNING: forceload of misc/md_trans failed

WARNING: forceload of misc/md_raid failed

configuring IPv4 interfaces: eri0 qfe0 qfe0:1 qfe0:10 qfe0:11 qfe0:12

qfe0:13 qfe0:14 qfe0:15 qfe0:16 qfe0:2 qfe0:24 qfe0:25 qfe0:26 qfe0:3

qfe0:4 qfe0:5 qfe0:6 qfe0:7 qfe0:8 qfe0:9 qfe1 qfe1:1 qfe1:10 qfe1:11

qfe1:12 qfe1:13 qfe1:14 qfe1:15 qfe1:16 qfe1:2 qfe1:24 qfe1:25 qfe1:26

qfe1:3 qfe1:4 qfe1:5 qfe1:6 qfe1:7 qfe1:8 qfe1:9ifconfig: e3:

qfe3.

TNG startup complete.

******************************************************

awservices already running

The system is ready.

_

c1t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0

Vendor: SEAGATE Product: ST336605FSUN36G Revision: 0438 Serial No:

0205P156XV

Size: 36.42GB <36418595328 bytes>

Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0

Illegal Request: 0 Predictive Failure Analysis: 0

c1t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0

Vendor: SEAGATE Product: ST336605FSUN36G Revision: 0438 Serial No:

0205P14VR6

Size: 36.42GB <36418595328 bytes>

Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0

Illegal Request: 0 Predictive Failure Analysis: 0

Thanks in advance.

[18218 byte] By [sunspot] at [2007-11-25 22:49:16]
# 1

Sunspot,

I don't think you might like this.

But first off, thank you for providing so much info.It did help.

You have a corrupt filesystem; mangled inodes; more.

The clues were in the excepts from the panic.

I am guessing that if you boot from CDROM, the system will sit there 'fat and happy' without any resets.

Among all that hexadecimal hash in your excerpts, note the text for "ufs".

For example:

ufs:real_panic

ufs:ufs_fault

ufs:ufs_ifree

... and there were problems when the system tried to create a core. (write error on /dev/dsk/c1t1d0s4)

Next, search these Hardware forums for keywords in quotes

"freeing free"

Read the postings from contributor <u>jds2n</u>

--

You would have probably avoided these recent events if there had been some preventative maintence over those two years of operation.

Your OBP is down about 18 patch update versions.

I can only guess about where the software patching might be ...

Hate to be brutal, but periodic preventative maintenance such as planned patching forces you to jump through certain hoops.

That pushes the system through such things as reconfiguration reboots

(required after patch clusters)

and spontaneous FSCK's.

<a href="http&#58;&#47;&#47;sunsolve.sun.com/handbook_pub/Systems/SunF ire280R/SunFire280R.html" target="_blank"><b><u>Here</u></b></a> is the 280R in the SSH.

You can find a link to the <a href="http&#58;&#47;&#47;sunsolve.sun.com/search/advsearch.do?colle ction=PATCH&amp;type=collections&amp;queryKey5=118323&amp;toDocument =yes" target="_blank"><u>280R OBP patch</u></a> on that page.

Click on that link and see the README file for it.

In that text file is another link to the previous accumulated older version of the OBP patch.

READ THAT OLDER TEXT FILE as well.There are some <i>gotchas</i> in that older file that do not appear in the newer one.

Use the newest firmware patch; it includes every update from every earlier one.

If you cannot repair the filesystems, you may have to reinstall everything and restore your data from backup.

rukbat at 2007-7-5 17:04:44 > top of Java-index,Sun Hardware,Servers - General Discussion...