System Hangs after adding Storage :(
First dont jump on me for mentioning a 4800 this is storage related.
Solaris 8
Brocade 3850(SUN Branded)
SUN Samfs(4.2.8, running 5 months)
EMC Clarion(Old existing and always seen in config)
Infortend FC-SATA array(Newly added)
I have added some FC-SATA drives to my 4800s. The process of adding these drives was done by a 'cfgadm -c configure <Controller #>'. Following this the disks are seen, striped and shared with SAMFS. All is good at this point.
During reboot the systems hangs just after the boot message of "System is coming up". I think it is at the point of finding mount points and doing an 'fsck', but the system never completes(+45 minutes).
When I drop back to the original configuration prior to the 'cfgadm' the system then boots just fine, but I then need to do 'cfgadm -c .....' to see the drives and mount my SAMFS partitions.
Any ideas?
[951 byte] By [
error] at [2007-11-25 22:59:56]

# 1
error, I think if I were you, I would like to know where it is 'hanging' first, so I'd put some 'set -xv' statements in a few startup scripts. start with S01MOUNTFSYS if you have a feeling it is a mount problem. come back, with your samfs config so we get a view of what it looks like and we'll go from there
hows your patches by the way? ;-)
# 2
Hey Bannana,
I knew you would be on the case. :)
Where do I place the "set -xv"? And what should I expect to see?
I was already starting to place "echo" statements in some of the scripts just to find out which script was having issues.
Patches were applied about a month and a half ago, so they are not "up-date" but they are not base install Solaris 8 2/02.
error at 2007-7-5 17:49:03 >

# 3
You would normally place set -x in first or second line in /etc/rcS and /etc/rc2 files to see which startup scripts completed and which one is hanging. Then you can place a set -x in that script to where it is hanging within the script.. If you can boot -sv then you don't need the rcS set -x which would require booting from CD, or net, to implement.
jds2n at 2007-7-5 17:49:03 >

# 4
thanks jds
error...always after the shell has been specified within the script (the first line), so just put it on the second line. you should see some more output when the scripts run which may give you a clue on which FS may be giving you a problem ... see how you get on and let us all know :-)
# 5
Thanks for the info....This will be an on-going process since I only get access to these machines once a month.I will keep you all posted.Bannana BTW..I am thinking of flying over the pond. Know any good pubs I might find a pint?
error at 2007-7-5 17:49:03 >

# 6
Feel free to tell me I am a plonker but, if the storage has just been added, how could it have a file system (or more) on it to cause problems? The fscks are fired up in response, so I believe, to the settings in vfstab - which means a file system and mount point.
# 7
SimonM
The disks can be seen when I attach the storage with a 'cfgadm -c configure <controller #>' and the system up and running. After dynamicly attaching the storage I can create filesystems, use SAMFS to share the filesystem out, and all is good across all machines.
I only have an issue later in life when a reboot(hate for a panic) occurs. Some sunsolve/goggle'ing found some old document that were very dated that pointed to the path_to_inst and fiber not playing well.
I am not 100% sure that the system has hit the mount/fsck processes or not but it is the next sequence in the boot process. I know it is storage/config related because (1) adding the new disks is the only thing that changed on the system, (2) returning the OS back to its original configuration before the disks being added allows everything to boot fine. I was just looking for some pointer of where to begin troubleshooting.
I will keep everyone updated. Like I said I only get a couple of hours every month if I am lucky.
error at 2007-7-5 17:49:03 >

# 8
<table border="0" align="center" width="90%" cellpadding="3" cellspacing="1"><tr><td class="SmallText"><b>error wrote on Fri, 23 September 2005 15:47</b></td></tr><tr><td class="quote">
Thanks for the info....This will be an on-going process since I only get access to these machines once a month.
I will keep you all posted.
Bannana BTW..I am thinking of flying over the pond. Know any good pubs I might find a pint?
</td></tr></table>
sorry Error, didn't see this post until now; it's the UK man, there are no rubbish pubs over here ;-)
# 9
Latest update.
Last night I was given a couple of hours for troubleshooting. It appears I have some sort of a race condition happening with the drives/driver initializing, switch initialization and SAMFS.
Work around: With all the drives cabled and configured I can get the system to boot without a problem as long as I do not mount them at boot(Commented out of the vfstab). Then mount the volumes by hand or script after the system is up.
At this point updating the driver, updateing the switch firmware or upgrading SAMFS(Which will never happen) might correct the issue.
I need to note to Bill's dis-like at this is only happening on my 4800's and I do not have the same experience with my 880's. I am opening a call with SUN for analysis.
error at 2007-7-5 17:49:03 >

# 10
... same HBA's in both platforms?
Bill at 2007-7-5 17:49:03 >

# 11
All the same QLogic QLA2342-SUN PCI/PCI-X.
error at 2007-7-5 17:49:03 >

# 12
... just checking ...<img src="images/smiley_icons/icon_smile.gif" border=0 alt="Smile">
Punch an Explorer through on the F4800 [ scextended if possible ]
and let the Storage support group take a peek.
Insist that USA engineers work it, though the platform type should assure that to happen, no matter what the contract level..
Bill at 2007-7-5 17:49:03 >

# 13
No can do...Explorers are not permitted to leave, if you understand.
error at 2007-7-5 17:49:03 >

# 14
I do understand.Now that you've mentioned it... If I recall correctly, I may have handled a number of your site's issues in the past, before they took me off Sunfire support for other business needs.
Bill at 2007-7-5 17:49:03 >
