Failover of fiber channel time

We have a v890 with 2 single channel hba's, Solaris 8, Veritas 4.1 (DMP), connected to a Clariion CX700 (Active/passive).

We are still in test mode for this configuration.

When we disconnect one of the fiber's at the HBA to simulate a failure, it takes almost 2 minutes before failover occurs and the other path is used.

Is this the expected timeout for failover to occur? I suspect this is controlled by DMP? Is there a way to make this happen quicker? Eventually this will be a db server and I'm concerned about the amount of time it takes to failover.

[584 byte] By [mytmozilla] at [2007-11-26 6:17:42]
# 1
Which HBAs are you using? Sun or other?
torreysun at 2007-7-6 13:59:21 > top of Java-index,Storage Forums,Storage General Discussion...
# 2
They are Sun. SG-XPCI1FC-QF2
mytmozilla at 2007-7-6 13:59:21 > top of Java-index,Storage Forums,Storage General Discussion...
# 3
Do you have the /var/adm/messages entries from the test? Pretty sure this is DMP but it could be the sd driver timing out as well.
torreysun at 2007-7-6 13:59:21 > top of Java-index,Storage Forums,Storage General Discussion...
# 4

unlpugged fiber....

Apr 6 13:00:19 dbserv02 qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(1): Link OFFLINE

Apr 6 13:00:19 dbserv02 scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@1/fp@0,0/ssd@w50060162306007c7,5 (ssd14):

Apr 6 13:00:19 dbserv02SCSI transport failed: reason 'tran_err': retrying command

Apr 6 13:00:54 dbserv02 scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@1/fp@0,0/ssd@w50060162306007c7,5 (ssd14):

Apr 6 13:00:54 dbserv02SCSI transport failed: reason 'timeout': retrying command

Apr 6 13:01:49 dbserv02 fctl: [ID 517869 kern.warning] WARNING: 186=>fp(1)::OFFLINE timeout

Apr 6 13:01:52 dbserv02 vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 118/0x70 belonging to the dmpnode 239/0x38

Apr 6 13:02:08 dbserv02 scsi: [ID 243001 kern.info] /pci@9,600000/SUNW,qlc@1/fp@0,0 (fcp1):

Apr 6 13:02:08 dbserv02offlining lun=5 (trace=0), target=611113 (trace=2800004)

repeating stuff about offlining of LUNs......

just plugged in fiber....

Apr 6 13:03:43 dbserv02 qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(1): Link ONLINE

mytmozilla at 2007-7-6 13:59:21 > top of Java-index,Storage Forums,Storage General Discussion...
# 5

You're seeing the default 90 second timer. If you pull the cable from the array itself you should see a faster notification and failover. However, you can short circuit some of these timers by changing certain variables in /etc/system. I wouldn't do this unless you really need to but a Sun tech could help with what settings to change.

torreysun at 2007-7-6 13:59:21 > top of Java-index,Storage Forums,Storage General Discussion...
# 6

Ok thanks.

Not so sure we want to stay with this active/passive SAN. We are concerned Oracle won't like waiting for a failover to happen, wonder how short a failover has to be so oracle won't notice?

We were told at time of purchase that there are "1,000's" of these installed, so they will work fine. Is anyone here running EMC Clariion's (CX-700) successfuly as a db storage? and is failover an issue for you?

mytmozilla at 2007-7-6 13:59:21 > top of Java-index,Storage Forums,Storage General Discussion...
# 7
Keep in mind that the timers are different depending on where the cable drops. Or if the LUN failsover at the controller.Those happen more frequetly and should be a little faster.
torreysun at 2007-7-6 13:59:21 > top of Java-index,Storage Forums,Storage General Discussion...