Errors after initial Sun Cluster install

- SunOS conch 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V210

- Sun Cluster 3.2

I've gone through the scinstall process using the standard answers to questions. The only exception is that when it came to quorum, I answered I would set it up later, as I want to try to the quorum server. There's no shared storage - I'm seeing if it's possible to create a cluster using IP based replication.

I'm getting these error messages every 30 seconds (looks like a result of:

# svcs lrc:/etc/rc3_d/S91initgchb_resd

STATE STIMEFMRI

legacy_run16:19:29 lrc:/etc/rc3_d/S91initgchb_resd

#

)

Feb 8 16:38:59 conch Cluster.GCHB_resd: Unable to open door descriptor /var/run/rgmd_receptionist_door

Feb 8 16:38:59 conch Cluster.GCHB_resd: GCHB system error: scha_cluster_open failed with 18

Feb 8 16:38:59 conch : Bad file number

Feb 8 16:39:29 conch Cluster.GCHB_resd: Unable to open door descriptor /var/run/rgmd_receptionist_door

Feb 8 16:39:29 conch Cluster.GCHB_resd: GCHB system error: scha_cluster_open failed with 18

Feb 8 16:39:29 conch : Bad file number

Feb 8 16:39:59 conch Cluster.GCHB_resd: Unable to open door descriptor /var/run/rgmd_receptionist_door

Feb 8 16:39:59 conch Cluster.GCHB_resd: GCHB system error: scha_cluster_open failed with 18

Feb 8 16:39:59 conch : Bad file number

Feb 8 16:40:29 conch Cluster.GCHB_resd: Unable to open door descriptor /var/run/rgmd_receptionist_door

Feb 8 16:40:29 conch Cluster.GCHB_resd: GCHB system error: scha_cluster_open failed with 18

Feb 8 16:40:29 conch : Bad file number

There's no file system errors, and I'm at a complete loss as to why there appears to be this problem. Can anyone offer any advice?

Cheers,

Iain

[1813 byte] By [iainfirkinsa] at [2007-11-26 17:48:49]
# 1

Just to check since you only mention Sun Cluster 3.2 - the error messages you cite are coming from components used within Sun Cluster Geo Edition, specificly from the geo heartbeat component.

So are you trying to setup a Geo cluster?

If so you should first finish the Sun Cluster setup. And then give some details on what you performed for setting up the geo edition.

Greets

Thorsten

Thorsten.Frueaufa at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 2

It's the Sun Cluster download from Sun's website, and I suspect the Geo component has been installed by virtue of installing the Availability Suite component (I'm thinking that might not actually be necessary now afterall if it's for IP-based disk replication between clusters as opposed to between nodes in a cluster). But in terms of the Sun Cluster install, I've literally ran through the scinstall program and done nothing else.

Iain

iainfirkinsa at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 3

Iain,

Why would you need IP based replication inside a cluster? That doesn't make sense to me. Unless you chose to install Sun Cluster Geo Edition, it shouldn't get installed. You can use the prodreg command to browse the registry of installed programs and see what Solaris thinks is installed from the JES/JAS set.

If Geo Edition is installed, you probably want to remove it if it is just a single cluster that you need.

Regards,

Tim

Tim.Reada at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 4

Re: IP based replication inside a cluster - it's for an experiment so see whether a cluster can be built without using shared storage, and using the replication to ensure the data is kept up-to-date on the backup node. I'm seeing if you can build a cluster without spending loads of cash, especially since the actual data to be replicated is going to be a few megabytes and I don't really want to spend loads of cash on expensive (as in price per MB used) shared storage.

That said, the lack of shared storage probably breaks basic cluster design (!) and I know there will be other issues to do with cluster resiliency etc. This is all about seeing if it can be done or not, and I'm beginning to think that it *can't* be done ....

Iain

iainfirkinsa at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 5

OK, this now makes sense.

The way to achieve this is to use Sun Cluster 3.2. You will need your two primary cluster nodes and a 3rd node to act as a quorum server. The latter just needs to be a very cheap machine capable of running Solaris 10.

You can then set up Sun Network Data Replicator (SNDR) which is part of availability suite to replicate the data between the cluster nodes. This should work without problems. No Sun Cluster Geo Edition is needed.

This is very much like what Sun's telco HA solution does.

Regards,

Tim

Tim.Reada at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 6

Hi,

there are 2 issues here.

1. THe error messages that you see. I get them on my freshly installed cluster as well. What did I do? I used the JES installer and installed SC3.2 and SCGeo 3.2 - to be configured later. Ithink that it should only install the packages but not configure any part of them. It seems that it does oitherwise. To me ghcb sound like global cluster heartbeat.. I'll follow up with the developers to get this clarified.

2. Replication within a cluster and no shared storage. THis has several aspects. I, too, see more and more customer demand to have this. If you get it to work let us know. I am not sure though, why you installed the SC Geo edition to achieve this, as I do not think it well help you here.

In any case I can only recommend to set up the quorum server before proceeding, otherwise your whole cluster will panic as soon as you do a single reboot. That is per design..

Regards

Hartmut

HartmutSa at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 7

Thanks for the comments Tim, I'm glad to hear this idea isn't unreasonable! This begs the question: how do you get Availability Suite? Is it a product in its own right, or is it Sun Java Availability Suite? Linking to Sun Java Availability Suite via http://www.sun.com/software/swportfolio/get.jsp (eventually) leads to a Sun Cluster 3.1 download. At a previous job, I remember having a Availability Suite 3.2 CD but I'm hoping that it can be downloaded from somewhere. Any ideas?

Iain

iainfirkinsa at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 8
Iain,I would guess it was in the availability suite bundle somewhere. As far as I can tell you cannot download it separately any more. Tim
Tim.Reada at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 9

Hi,

you got confused, what a surprise, if the marketing folks use the same name for different products. What you are looking for is this:

http://www.sun.com/storagetek/management_software/data_protection/availability/

It is the StorageTex Availability Suite, consisting of a snapshot component and a replication componente. I did not see it on the external download site, and I know that it is a product that has to be licensed seperately. I am pretty sure that it is not part of the Java Availability Suite, which is a subset of the Java Enterprise System and covers Sun Cluster and the Sun Cluster Geographic Edition.

Availability Suite would replicate volumes. If you only have a couple of megabytes to replicate, could you think of another way of doing this?

HartmutSa at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 10

Is this Solaris 6/06

I had the EXACT same message about rgmd_door

noticed that the cluster/rgm service wasnt starting (and wouldnt start)

I started it manually and that message went away, but the cluster was still fouled up

noticed that the cluster wasnt booting to milestone, and it was because system/pool isnt there... not available in pre-11/06

so now I am screwed... my copy of SC 3.1 only supports Solaris 8 and 9, and I have a fouled up 11/06 image and too slow of a inet connection to download another one, and my SC 3.2 wont work with what I have now...

j2k4reala at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 11

I checked with engineering and got the explanation where the GHCB messages are coming from.

1. They are harmless and do NOT indicate any problem with the cluster.

2. They will go away with a patch some time in the future. It is kind of a race condition between various services.

3. If you install the Sun Cluster Geographic Edition packages, which is, what you have done and what I have done by explicitely checking the box in the JES installer, SC Geo Edition will start its own heartbeat. This is so that an other cluster, already running SC Geo Edition would be able to contact this cluster without any manual configuration in the beginning.

4. Manually starting any services does not solve this problem.

Hope that helped

Hartmut

HartmutSa at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 12

Instead of using Availability Suite, I guess there's an rdist, ufsdump/restore. I think AVS is a nice neat way of replicating the data for the cluster in real-time, but getting it is being a pain in the backside! From that page, there's no link to download the software (as far as I can tell), and even looking at http://www.sun.com/software/downloads/ there's nothing that stands out as being the actual package itself. According to the release notes, I should be getting:

SUNWscmr

SUNWscmu

SUNWspsvr

SUNWspsvu

SUNWiir

SUNWiiu

SUNWrdcr

SUNWrdcu

Bit of a long shot, but can these packages be downloaded individually?

Iain

iainfirkinsa at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 13
I don't think so. They are either in the larger bundles or they aren't there at all. I can't see them in the JAS suite so may be they aren't available for download.Tim
Tim.Reada at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 14

Hmmm, looks like ?00 to purchase the media and around ?k for 12 months support (up to 1TB). Still cheaper than a shared array or HBAs to connect to the SAN, so it's definitely an option. I'll continue investigating other methods to see if there are other ways.

Thanks for all your comments on this.

Iain

iainfirkinsa at 2007-7-9 5:01:13 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 15

Thanks to j2k4real and HartmutS for their comments about the messages I've been seeing.

I've made some progress - I've got a 2 node cluster set-up with Sun Cluster 3.2 running on Sol 10 03/05 and fully patched - and I'm trying to set-up a quorum server. According to the Sun Cluster Reference Manual, this is done by using 'clsetup'. However, when I run 'clsetup', I see the following:

# ./clsetup

>>> Initial Cluster Setup <<<

This program has detected that the cluster "installmode" attribute is

still enabled. As such, certain initial cluster setup steps will be

performed at this time. This includes adding any necessary quorum

devices, then resetting both the quorum vote counts and the

"installmode" property.

Please do not proceed if any additional nodes have yet to join the

cluster.

Is it okay to continue (yes/no) [yes]?

Unable to establish the list of cluster nodes.

Press Enter to continue:

#

Every time I run 'clsetup' I see the following on the console:

Feb 15 16:57:41 whelk Cluster.CCR: Unable to open door descriptor /var/run/rgmd_receptionist_door

Feb 15 16:57:41 whelk last message repeated 1 time

Feb 15 16:57:43 whelk Cluster.RGMPMF.lib: Unable to open door descriptor /var/run/rgmd_receptionist_door

I've read a little about the "installmode" attribute but I'm not sure how to change it or if that's even possible. I also noticed the following:

# svcs | grep cluster | grep offline

offline15:33:39 svc:/system/cluster/scslm:default

offline15:33:39 svc:/system/cluster/rgm:default

offline15:33:39 svc:/system/cluster/cl-svc-cluster-milestone:default

offline15:33:39 svc:/system/cluster/scsymon-srv:default

offline15:33:39 svc:/system/cluster/scslmclean:default

offline15:33:39 svc:/system/cluster/rpc-fed:default

offline15:33:39 svc:/system/cluster/sckeysync:default

#

# ./scconf -p | grep install

Failed to get node zone list

Failed to get node zone list

Cluster install mode:enabled

#

I don't know where the 'Failed to get node zone list' comes from, but it's present on standard error and I sometimes see it when I run clsetup.

So I guess the next question is: given the above, how do I set-up a quorum server for the cluster?

Iain

iainfirkinsa at 2007-7-9 5:01:15 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 16

Using liveupdate, I mounted a Sol 10 11/06 .iso and upgraded my install to the latest release, repatched, re-installed Sun Cluster, and now it's working fine. And since, I've found documents on the web stating that Sun Cluster 3.2 only works with Sol 10 11/06 - which is probably why the error messages above were coming up!

So now I'm quite happy, and impressed with how easy liveupdate worked!

Iain

iainfirkinsa at 2007-7-9 5:01:15 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...
# 17

Hi,

the error messages you saw seemed to be harmless- But good that you solved most of the issues.

More information on the StorageTek Availability Suite for replicating data. I checked and it cannot be downloaded as a product. You can either buy it - and I think you checked that already or go to the Open Solaris pages http://opensolaris.org/os/project/avs/

But the packages available there will not install with S10 - for technical reasons. On the other hand SunCluster will not work with Open Solaris at the moment, but only with S10U3.

Hartmut

HartmutSa at 2007-7-9 5:01:15 > top of Java-index,Solaris Operating System,Solaris Essentials - General Technical Questions...