x4200 & x4100 disk failures

In the last 6 months I've had 4 hard disks fail in these servers and at no time has there been either anything in the SP, the only time it's been noticed that there's a disk failure is when I reboot the box with a console attached to it.

They have all been Fujitsu MAY2073RC's

Is anyone else having the same problem?

I've upgraded the firmware on an affected server but still it seems to think that the harddrive is OK.

Property Value

SP Firmware Version 1.1.1

SP Firmware Build Number 16618

SP Firmware Date Tue Feb 27 19:44:15 PST 2007

SP Filesystem Version 0.1.14

The firmware on the LSI controller is also 6.10

Is there something I need to change within the controller?

I'm starting to wonder if buying these servers may have been a mistake.

[829 byte] By [lsba] at [2007-11-27 3:06:16]
# 1
So far I've discovered that with linux you need to install mpt-status to query the controller but I can't get it to make the LED's blink automatically
my_utop1aa at 2007-7-12 3:52:29 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 2
After discussing the matter with sun's support they say the LED's and the SP will only see a failure if tehre is a physical failure of the hard disk.This is apparently different from the hard disk failing and being marked so by the controller.
lsba at 2007-7-12 3:52:29 > top of Java-index,Sun Hardware,Servers - General Discussion...
# 3

Got something I didn't really expect.

At first I was thinking it was something environmental like ambient temp etc.

In the end this turned out to be a bug in oracle (BUG ID: 5713547).

For those that don't have access to metalink here is the full report. which has been replicated on both x4100's and x4200's

Bug No. 5713547

Filed 13-DEC-2006 Updated 27-APR-2007

Product Enterprise Manager Grid Control Product Version 10.2.0.1.0

Platform Linux x86 Platform Version No Data

Database Version 10.1.0.4.0 Affects Platforms Generic

Severity Severe Loss of Service Status Development to Q/A

Base Bug N/A Fixed in Product Version 11.0.0.0

Problem statement:

STORAGE_REPORT_METRICS.PL IS CORRUPTING RAID DISK MIRRORING

*** 12/13/06 03:51 am ***

TAR

6024671.994

.

Problem Description

-

Whenever the agent fires off the scripts

$ORACLE_HOME/sysman/admin/scripts/storage_report_metrics.pl a number of

errors appear in the file $ORACLE_HOME/sysman/sysman/log/emagent_perl.trc and

the disks then report that there is a problem and the LSI raid controller

splits the mirroring.

.

Environment Information

--

Red Hat Enterprise Linux 3 Update 5 (32 bit)

2005

Grid Agent 10.2.0.1

.

LSI information:

SP (LOM device; LSI RAID controller firmware tends to be updated at same

time

as this)

-> version

SP firmware 1.0.1

SP firmware build number: 10664

SP firmware date: Tue May 2 15:58:21 PDT 2006

SP filesystem version: 0.1.13

->

LSI (Hardware Level) RAID Controller

* LSI Logic MPT Setup Utility v6.02.00.00 (2005.07.08)

*

* Adapter List Global Properties

*

* Adapter PCI PCI PCI PCI FW Revision Status Boot

*

* Bus Dev Fnc Slot Order

*

* SAS1064 02 03 00 00 1.04.00.00-IR Enabled 0

**

.

Test Case Step-by-Step Instructions

--

n/a

.

Test Case Location

n/a

.

Diagnostic Analysis

-

Sun engineer at customer site has investigated and supplied the following:

Command to break raidset

.

strace output of above cammand attached.

Raid failure report for /var/log/messages. will be uploaded

.

emagent_perl.trc shows:

.

storage_report_metrics.pl: Wed Dec 6 13:44:15 2006: WARN:

STORAGE_REPORTS:size for /dev/vol0/lv01 is partition not numeric, this value

will be defaulted to 0

storage_report_metrics.pl: Wed Dec 6 13:44:15 2006: WARN:

STORAGE_REPORTS:used for /dev/vol0/lv01 is partition not numeric, this value

will be defaulted to 0

storage_report_metrics.pl: Wed Dec 6 13:44:15 2006: WARN:

STORAGE_REPORTS:free for /dev/vol0/lv01 is undefined , this value will be

defaulted to 0

storage_report_metrics.pl: Wed Dec 6 13:44:17 2006: WARN:

STORAGE_REPORTS:No MetaDisks configured

storage_report_metrics.pl: Wed Dec 6 13:44:17 2006: WARN:

STORAGE_REPORTS:Use of uninitialized value in concatenation (.) or string at

/u01/app/oracle/product/agent10g/sysman/admin/scripts/storage/sRawmetrics.pm

line 325.

storage_report_metrics.pl: Wed Dec 6 13:44:17 2006: WARN:

STORAGE_REPORTS:driver with major# 232 used for disk device /dev/emcpowera is

not a supported disk, skipping disk

storage_report_metrics.pl: Wed Dec 6 13:44:17 2006: WARN:

STORAGE_REPORTS:Use of uninitialized value in concatenation (.) or string at

/u01/app/oracle/product/agent10g/sysman/admin/scripts/storage/sRawmetrics.pm

line 325.

storage_report_metrics.pl: Wed Dec 6 13:44:17 2006: WARN:

STORAGE_REPORTS:driver with major# 232 used for disk device /dev/emcpowera1

is not a supported disk, skipping disk

storage_report_metrics.pl: Wed Dec 6 13:44:17 2006: WARN:

STORAGE_REPORTS:Use of uninitialized value in concatenation (.) or string at

/u01/app/oracle/product/agent10g/sysman/admin/scripts/storage/sRawmetrics.pm

line 325.

storage_report_metrics.pl: Wed Dec 6 13:44:17 2006: WARN:

STORAGE_REPORTS:driver with major# 232 used for disk device /dev/emcpowerb is

not a supported disk, skipping disk

storage_report_metrics.pl: Wed Dec 6 13:44:17 2006: WARN:

STORAGE_REPORTS:Use of uninitialized value in concatenation (.) or string at

/u01/app/oracle/product/agent10g/sysman/admin/scripts/storage/sRawmetrics.pm

line 325.

.

Performance

--

.

NLS Information

.

Patches

-

.

Log Files Location

.

Reproducibility

Customer in SR 6034193.992 reports the same issue.

.

URL

n/a

.

Did you test with the latest version?

-

10.2.0.1

.

Available Workarounds

rename storage_report_metrics.pl ; this will generate metric collection

errors however.

.

Related Bugs

.

Severity 1 Information

-

.

Additional Information

-

.

*** 12/13/06 03:57 am ***

*** 12/13/06 07:25 am *** (CHG: Sta->16)

*** 12/14/06 07:47 am *** (CHG: Asg->NEW OWNER)

*** 12/18/06 08:03 am *** (CHG: Asg->NEW OWNER)

*** 12/18/06 08:03 am ***

*** 12/18/06 08:03 am *** (CHG: Asg->NEW OWNER)

*** 12/18/06 08:43 am ***

*** 12/19/06 12:59 am *** (CHG: Sta->10)

*** 12/19/06 12:59 am ***

*** 12/20/06 06:34 am *** (CHG: Sta->16)

*** 12/20/06 06:34 am ***

*** 12/20/06 06:54 am ***

*** 12/20/06 07:37 am ***

*** 12/20/06 07:39 am *** (CHG: Sta->11)

*** 12/20/06 07:40 am *** (CHG: Asg->NEW OWNER)

*** 12/26/06 01:52 pm *** (CHG: DevPri->2)

*** 12/26/06 01:52 pm *** (CHG: Confirmed Flag->Y)

*** 12/26/06 01:52 pm *** (CHG: Asg->NEW OWNER)

*** 12/26/06 01:52 pm ***

*** 12/26/06 01:52 pm *** (CHG: Asg->NEW OWNER)

*** 12/27/06 01:22 am ***

*** 01/17/07 06:39 am *** (CHG: Asg->NEW OWNER)

*** 01/25/07 04:59 am ***

*** 01/29/07 12:38 pm ***

*** 01/29/07 12:39 pm ***

*** 01/31/07 03:20 am ***

Hi Dragos,

If you have different customer hitting the same issue, I suggest you to file

a new bug and reference in this bug.

*** 02/02/07 05:45 pm ***

*** 02/06/07 02:14 am *** (CHG: Asg->NEW OWNER)

*** 02/06/07 02:14 am ***

*** 02/06/07 05:53 am ***

*** 02/06/07 05:55 am ***

*** 02/06/07 01:09 pm ***

*** 02/06/07 11:22 pm *** (CHG: Sta->30)

*** 02/06/07 11:22 pm ***

*** 02/07/07 05:24 am *** (CHG: DevPri->1)

*** 02/08/07 05:37 am ***

*** 02/08/07 05:54 am *** ESCALATED

*** 02/08/07 05:54 am ***

*** 02/08/07 05:58 am ***

*** 02/08/07 05:59 am ***

*** 02/08/07 06:41 am *** (CHG: Sta->11)

*** 02/08/07 06:41 am ***

*** 02/09/07 03:16 am *** (CHG: Sta->30)

*** 02/09/07 03:16 am ***

*** 02/09/07 06:04 am ***

*** 02/12/07 04:17 am *** (CHG: Sta->11)

*** 02/12/07 04:17 am ***

*** 02/12/07 03:51 pm ***

*** 02/12/07 11:38 pm *** (CHG: Sta->52 Asg->NEW OWNER)

*** 02/12/07 11:38 pm ***

*** 02/13/07 12:16 am *** (CHG: Sta->11 Asg->NEW OWNER)

*** 02/13/07 12:31 am ***

*** 02/13/07 12:32 am *** (CHG: Fixed->10.2.0.2)

*** 02/13/07 12:32 am *** (CHG: Sta->30 Asg->NEW OWNER)

*** 02/13/07 12:32 am ***

*** 02/13/07 06:30 am ***

*** 02/13/07 06:30 am *** (CHG: Sta->11)

*** 02/13/07 08:10 am *** (CHG: Sta->52 Asg->NEW OWNER)

*** 02/13/07 08:10 am ***

*** 02/13/07 07:57 pm *** (CHG: Sta->11 Asg->NEW OWNER)

*** 02/13/07 09:41 pm ***

*** 02/13/07 09:43 pm *** (CHG: Fixed->10.2.0.2)

*** 02/13/07 09:43 pm *** (CHG: Sta->30 Asg->NEW OWNER)

*** 02/13/07 09:43 pm ***

*** 02/14/07 09:08 am ***

*** 02/14/07 12:33 pm ***

*** 02/14/07 06:29 pm ***

*** 02/20/07 04:40 am *** (CHG: Sta->11)

*** 02/20/07 04:40 am ***

*** 02/20/07 04:51 am *** (CHG: Sta->30)

*** 02/20/07 04:51 am ***

*** 02/26/07 12:45 am ***

*** 02/28/07 12:04 am ***

*** 02/28/07 02:25 pm ***

REDISCOVERY INFORMATION:

If you execute the nmhs as follows and Raid failure reports for /var/log/messag

es, you are facing this bug.

WORKAROUND:

None

RELEASE NOTES:

*** 02/28/07 02:31 pm *** (CHG: Fixed->11.0.0.0)

*** 02/28/07 02:31 pm *** (CHG: Sta->80)

*** 02/28/07 02:49 pm ***

*** 03/01/07 12:39 am ***

*** 03/16/07 03:59 am ***

*** 03/16/07 04:00 am ***

*** 03/16/07 04:00 am ***

*** 03/19/07 01:24 am ***

*** 03/21/07 01:18 am *** ESCALATION -> CLOSED

*** 03/27/07 08:58 am ***

*** 03/28/07 05:53 am ***

*** 03/29/07 12:28 am ***

*** 03/29/07 12:30 am ***

*** 03/29/07 12:31 am ***

*** 04/03/07 07:18 pm ***

*** 04/23/07 10:45 pm ***

*** 04/27/07 01:18 am ***

my_utop1aa at 2007-7-12 3:52:29 > top of Java-index,Sun Hardware,Servers - General Discussion...