General Solaris 10 Discussion - Strange SAN Problem
I have a configuration with two serves and two SAN Storages, each server is connected to both SAN Storages, we are using host based mirroring, just a plain two node cluster setup.
I have regularly warnings in the messages file, and now I have one disk offlined on one Path from one Node, all other disk are online, the same disk is also online on the other hosts.
I have the following entries in messages:
Mar 24 08:53:26 MyHostA scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Mar 24 08:53:26 MyHostA /scsi_vhci/ssd@g60060e8004ebc0000000ebc000001aa4 (ssd14): Command Timeout on path /pci@7c0/pci@0
/pci@8/SUNW,emlxs@0/fp@0,0 (fp0)
Mar 24 08:53:26 MyHostA scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060e8004ebc0000000ebc000001aa4 (ssd14):
Mar 24 08:53:26 MyHostA SCSI transport failed: reason'timeout': retrying command
Mar 24 08:54:10 MyHostA scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Mar 24 08:54:10 MyHostA /scsi_vhci/ssd@g60060e8004ebc0000000ebc000001aae (ssd10): Command Timeout on path /pci@7c0/pci@0
/pci@8/SUNW,emlxs@0/fp@0,0 (fp0)
Mar 24 08:54:41 MyHostA scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Mar 24 08:54:41 MyHostA /scsi_vhci/ssd@g60060e8004ebc0000000ebc000001aa4 (ssd14): Command Timeout on path /pci@7c0/pci@0
/pci@8/SUNW,emlxs@0/fp@0,0 (fp0)
Mar 24 08:55:22 MyHostA scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Mar 24 08:55:22 MyHostA /scsi_vhci/ssd@g60060e8004ebc0000000ebc000001aaa (ssd11): Command Timeout on path /pci@7c0/pci@0
/pci@8/SUNW,emlxs@0/fp@0,0 (fp0)
Mar 24 08:57:58 MyHostA scsi: [ID 243001 kern.warning] WARNING: /pci@7c0/pci@0/pci@8/SUNW,emlxs@0/fp@0,0 (fcp0):
Mar 24 08:57:58 MyHostA INQUIRY to D_ID=0x1f1400 lun=0x1 failed: State:Packet Transport error, Reason:Undefined. Giving
up
Mar 24 08:57:58 MyHostA scsi: [ID 243001 kern.info] /pci@7c0/pci@0/pci@8/SUNW,emlxs@0/fp@0,0 (fcp0):
Mar 24 08:57:58 MyHostA offlining lun=1 (trace=0), target=1f1400 (trace=b90101)
Mar 24 08:57:58 MyHostA genunix: [ID 834635 kern.info] /scsi_vhci/ssd@g60060e8004ebc0000000ebc000001aa6 (ssd13) multipath status
: degraded, path /pci@7c0/pci@0/pci@8/SUNW,emlxs@0/fp@0,0 (fp0) to target address: w50060e8004ebc058,1 is offline Load balancing
: round-robin
Mar 24 08:57:59 MyHostA genunix: [ID 834635 kern.info] /scsi_vhci/ssd@g60060e8004ebc0000000ebc000001aa6 (ssd13) multipath status
: optimal, path /pci@7c0/pci@0/pci@8/SUNW,emlxs@0/fp@0,0 (fp0) to target address: w50060e8004ebc058,1 is online Load balancing:
round-robin
I suspect a bad element in the transmission paths between the servers and the SNA Storage, wich causes sometimes this timeout, but i have no idea how to diagnose such a problem
Thanks for any help

