clustering without share disk ?
I am thinking of setting a cluster environment consist of solaris 10.
But my current reading and finding is that I need to have shard disk in cluster.
Isn't this beat the purpose of H.A. one single(set) of disk is point of failure.
Is there a DRBD type struture can be implemented using Sun Cluster software ?
I seached and found there is a similar question like mine in 2005 but no followup thread.
http://forum.sun.com/jive/thread.jspa?forumID=295&threadID=76390
Thanks for your pointer,
tj
[546 byte] By [
tj_yang] at [2007-11-26 10:00:09]

# 1
Hi,
A shared disk is typically used alongwith other techniques such as mirroring and multipathing. So, no, it does NOT beat the purpose of HA clustering.
Coming to DRBD style network replicated storage, i suppose it can be made to work,
with some (severe) limitations. However, for some customers who are willing to live
with those limitations, i suppose it is OK. Here are some thoughts on what those limitations can be.
1) Network replicated storage cannot be used as a Quorum device. I suppose that is
saying the obvious. So, for 2 node clusters, one would have to find other ways of
providing quorum. See elsewhere on what those options can be. One can also
go for a 3 node cluster instead.
2) Problem with data ordering. Consider the following sequence of events:
- Node1 is DRDB master, replicating to node2
- Node1 crashes, node2 becomes master. applications are brought online on node2
and node2's copy of data begins to divert from node1's.
- Before node1 can reboot and join the cluster and fully sync up (this can take
some time), node2 goes down.
Now you are in a weired situation. Node2 is more uptodate data, but it is down.
Node1 is UP, but it has a stale copy of data. Now what do you do? In most cases,
you don't even KNOW that the node which is currently UP has stale data (because
you cannot ask the other node which is down). So, typically, you would have to
continue with the stale data and hope the applications running on top are OK.
This issue stems from the basic problem that you essentially have 2 copies of the
data which you are trying to keep in sync across failures of individual components.
If you think hard enough, you can probably come up with other failure scenarios which
essentially exploit this fundamental weakness of this system.
A dual ported (or SAN/NAS) storage doesn't have this problem.
3) Bulk updates vs filesystem level consistency. In scenarios where one side of
the network mirror is doing a bulk transfer (say, after it is rejoining the cluster
after an outage), the updates are typically done at a storage block level, NOT at
the level of fileystem updates. What that means is that the mirror is basically inconsitent wrt the filesystem on top. If the sync up finishes and the mirrors truly get
into synchronous mode, great. But if not, during this time, the cluster is
very vulnerable to failures. A failure could leave the mirror in an inconsistent state.
People typically say: "Oh well, wouldn't a fsck fix everything?". I say, maybe, maybe
not. Depends on the filesystem. In any case,do not run mission critical stuff on
this software stack unless you have an express guarantee of support from your
filesystem vendor with the network replication technology. You have
been warned! :-)
My advice: Create a resource type for DRBD using SunCluster's scdsbuilder tool.
Play with various scenarios to see how DRDB behaves in a variety of automated
failover scenarios. Maybe you would find that the scenarios are OK for the
kind of deployment YOU have in mind.
If you do play with DRDB, please do share your experience on this forum.
HTH,
-ashu
# 2
<snip>
> My advice: Create a resource type for DRBD using
> SunCluster's scdsbuilder tool.
> Play with various scenarios to see how DRDB behaves
> in a variety of automated
> failover scenarios. Maybe you would find that the
> scenarios are OK for the
> kind of deployment YOU have in mind.
I plan to build a hobbit server with H.A. infrastructure on Solaris(sparc) 10 so the system monitor software itself can be more reliable.
I am thankful that you took the time to write the detail reply, I have suncluster blueprint book, I will read it with your suggestion to see if I can create a two node HA cluster without shard disk
> If you do play with DRDB, please do share your
> experience on this forum.
DRBD/Hearbeat for linux is well understood and have user poplutation, it works on Solaris too but it is less proven compared to DRBD for linux.
I will post my testing later.
> HTH,
> -ashu