/globaldevice whackiness

I've gotten my jumpstart server configured to load my clusters off of a pre-configured state file with the installer (thanks Tim for the info on how to do this).

Now I see a repeatable error - the global device does not get added properly to the 2nd node of the cluster.

If I run 'scinstall' on nodeA then nodeB after the reboot doesn't mount it's globaldevice. If I re-jumpstart and run scinstall on nodeB then nodeA doesn't mount globaldevices correctly.

Here's syslog info. sccheck & /var/cluster/logs is clean.

--

Mar 22 23:42:23 lab-15k-c Cluster.CCR: [ID 914260 daemon.warning] Failed to retrieve global fencing status from the global nam

e server

Mar 22 23:42:23 lab-15k-c Cluster.CCR: [ID 485680 daemon.warning] reservation warning(node_join) - Unable to lookup local_only

flag for device dsk/d11.

Mar 22 23:42:23 lab-15k-c Cluster.CCR: [ID 485680 daemon.warning] reservation warning(node_join) - Unable to lookup local_only

flag for device dsk/d10.

Mar 22 23:42:25 lab-15k-c svc.startd[8]: [ID 652011 daemon.warning] svc:/system/cluster/globaldevices:default: Method "/usr/cl

uster/lib/svc/method/globaldevices start" failed with exit status 96.

Mar 22 23:42:25 lab-15k-c svc.startd[8]: [ID 748625 daemon.error] system/cluster/globaldevices:default misconfigured: transiti

oned to maintenance (see 'svcs -xv' for details)

Mar 22 23:42:25 lab-15k-c Cluster.CCR: [ID 795553 daemon.error] /usr/cluster/bin/scgdevs: Filesystem /global/.devices/node@2 i

s not available in /etc/mnttab.

Mar 22 23:42:25 lab-15k-c last message repeated 1 time

Mar 22 23:43:10 lab-15k-c Cluster.GCHB_resd: [ID 344672 daemon.error] Unable to open door descriptor /var/run/rgmd_receptionis

t_door

Mar 22 23:43:10 lab-15k-c Cluster.GCHB_resd: [ID 625214 daemon.error] GCHB system error: scha_cluster_open failed with 18

Mar 22 23:43:10 lab-15k-c : Bad file number

Mar 22 23:43:10 lab-15k-c Cluster.PMF.pmfd: [ID 819736 daemon.notice] PMF is restarting process that died: tag=gchb_resd, cmd_

path=/usr/cluster/lib/geo/lib/gchb_resd, max_retries=-1, num_retries=0

Mar 22 23:43:10 lab-15k-c Cluster.GCHB_resd: [ID 344672 daemon.error] Unable to open door descriptor /var/run/rgmd_receptionis

t_door

Mar 22 23:43:10 lab-15k-c Cluster.GCHB_resd: [ID 625214 daemon.error] GCHB system error: scha_cluster_open failed with 18

Mar 22 23:43:10 lab-15k-c : Bad file number

Mar 22 23:43:10 lab-15k-c Cluster.PMF.pmfd: [ID 534408 daemon.notice] "gchb_resd" restarting too often ... sleeping 1 seconds.

Mar 22 23:43:11 lab-15k-c Cluster.PMF.pmfd: [ID 819736 daemon.notice] PMF is restarting process that died: tag=gchb_resd, cmd_

path=/usr/cluster/lib/geo/lib/gchb_resd, max_retries=-1, num_retries=1

Mar 22 23:43:11 lab-15k-c Cluster.GCHB_resd: [ID 344672 daemon.error] Unable to open door descriptor /var/run/rgmd_receptionis

t_door

Mar 22 23:43:11 lab-15k-c Cluster.GCHB_resd: [ID 625214 daemon.error] GCHB system error: scha_cluster_open failed with 18

Mar 22 23:43:11 lab-15k-c : Bad file number

Mar 22 23:43:11 lab-15k-c Cluster.PMF.pmfd: [ID 534408 daemon.notice] "gchb_resd" restarting too often ... sleeping 2 seconds.

Mar 22 23:43:13 lab-15k-c Cluster.PMF.pmfd: [ID 819736 daemon.notice] PMF is restarting process that died: tag=gchb_resd, cmd_

path=/usr/cluster/lib/geo/lib/gchb_resd, max_retries=-1, num_retries=2

Mar 22 23:43:13 lab-15k-c Cluster.GCHB_resd: [ID 344672 daemon.error] Unable to open door descriptor /var/run/rgmd_receptionis

t_door

Mar 22 23:43:13 lab-15k-c Cluster.GCHB_resd: [ID 625214 daemon.error] GCHB system error: scha_cluster_open failed with 18

Mar 22 23:43:13 lab-15k-c : Bad file number

Mar 22 23:43:13 lab-15k-c Cluster.PMF.pmfd: [ID 534408 daemon.notice] "gchb_resd" restarting too often ... sleeping 4 seconds.

Mar 22 23:43:16 lab-15k-c Cluster.RGM.rgmd: [ID 537175 daemon.notice] CMM: Node lab-15k-b (nodeid: 1, incarnation #: 117462122

1) has become reachable.

Mar 22 23:43:16 lab-15k-c Cluster.RGM.rgmd: [ID 525628 daemon.notice] CMM: Cluster has reached quorum.

Mar 22 23:43:16 lab-15k-c Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node lab-15k-b (nodeid = 1) is up; new incarnation

"errors.txt" 55 lines, 4411 characters

number = 1174621396.

Mar 22 23:43:16 lab-15k-c Cluster.RGM.rgmd: [ID 670814 daemon.notice] Blocking in RGM

Mar 22 23:43:17 lab-15k-c Cluster.PMF.pmfd: [ID 819736 daemon.notice] PMF is restarting process that died: tag=gchb_resd, cmd_

path=/usr/cluster/lib/geo/lib/gchb_resd, max_retries=-1, num_retries=3

Mar 22 23:43:17 lab-15k-c svc.startd[8]: [ID 652011 daemon.warning] svc:/system/cluster/scsymon-srv:default: Method "/usr/clus

ter/lib/svc/method/svc_scsymon_srv start" failed with exit status 96.

Mar 22 23:43:17 lab-15k-c svc.startd[8]: [ID 748625 daemon.error] system/cluster/scsymon-srv:default misconfigured: transition

ed to maintenance (see 'svcs -xv' for details)

[5169 byte] By [JPMCKharea] at [2007-11-26 22:38:50]
# 1

Are you sure your JumpStart script doesn't have something in that is conflicting with the installs? I know from biiter experience how difficult it is to debug some of these complex stack installs!

I would suggest trying to build a vanilla Solaris system via JumpStart and then manually and silently try running the JES installer with your state file and see if that works on its own. If it doesn't you've found your problem. If it does, I'm not sure what to recommend apart from making sure that the Solaris image is fully patched before it gets to the JES installer point.

Tim

Tim.Reada at 2007-7-10 11:51:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...