Session Replication Problem
I was trying to make session replication work. But the backup machine keeps giving me:
REPL0037: Got exceptionwhile connection to backup server
right after it it says
REPL0076:Established connection to Backup Instance
After that I am trying to take the main instance down, hoping that backup will kick in. An it doesn't. Any ideas how to make this work?
[419 byte] By [
SergeySa] at [2007-11-27 5:17:24]

# 1
Sergey,What is the setup you have? How many nodes are in the cluster? are there other errors reported to the log file?
# 2
I am just testing so the configuration is like that:
-one admin - it has an instance of an app
- one backup node - it has the same exact instance
The idea is to make the session replication work, so as soon as I will turn of the instance let's say on admin, the back up node node will take over.
There are some errors like something is out of sequence, and whole bunch of RMI exceptions. Do you happen to know any solid example of how it is done properly, except for the one in the admin guide?
Thanks.
# 3
With a two nodes configuration, having one node down is an abnormal situation: The surviving node will always complain of its inability to save sessions to a backup. In the current schema sessions are saved in the other node. Having one node only running means there is no fail over.
Still, when going from two nodes to one because of failure, the surviving node should be able to recover the sessions from the failed node from itself. That should work. After that however, sessions wont be replicated until the failed node recovers.
I will try to get a blog on how to get a simple app working in a two node configuration with session replication enable, and post a link to it here.
# 4
I am not really sure why is that an abnormal situation? My understanding of that was if let's say there is a need for a hardware upgrade or any other procedure when one of the nodes will be down. Another one should take the current and potential sessions and work with them and suppose to be transparent to the client. Am I wrong?
# 5
SergeyS, yes, it should be transparent to the client. The situation is "abnormal" in that a node is offline, so we'd expect the server to log those error messages. (I'd call it "degraded", not "abnormal".)
What, exactly, is the problem you're seeing? What device (e.g. load balancer) are you using to direct connections to the nodes in your cluster? Is that device correctly failing over connections?
# 6
I was using the reverse proxy that comes with the server - straight out of the manual configuration. But I also was under impression that once the nodes are in cluster, the server can take care of rerouting traffic by itself as long as the admin node is up.
# 7
The admin node is unrelated to reverse proxy or session replication. It's only used for deploying configuration changes made by the administrator (you) using the admin GUI or CLI. It's possible to use the reverse proxy and session replication without an admin node.
Again, what, exactly, is the problem you're seeing?
Do you have two instances configured? Is the reverse proxy configured to send requests to both instances? Is that working?
# 8
Let me try to explain what is needed to be done and a guess we can deduce the flaw.
I nave: couple of web sites that need to be migrated to SJWS 7.0.
Need to have: session replication
I did everything that the manual (very poorly written) said. It does create the two copies on the admin (done by me) and on the node that is within the cluster with the admin. Also added the session replication in Java Tab. The serilized stuff is being created on the node, but when one of them is turned off , session is lost as well
# 9
I see you don't mention creating a reverse proxy for your cluster, If so what you currently have is basic session replication. What you probably want is fail over. There is a small difference between those two, and that is where the misunderstanding probably comes from.
Try the following scenario: simply stop and start the server instance that has been handling the requests. It should be able to keep the same sessions after the restart -provided all information in the session is serializable. In this case the same instance is capable of recovering its sessions from the backup. This is what basic session replication is.
Now what you probably want is session failover, the ability of the backup instance to handle requests as if it were the failed instance. In order for that to work, you need some type of reverse proxy/load balancing of the requests. This is because you need to make transparent to the clients that the requests are being served by a different server.
Sun web server provides an in built reverse proxy feature with load balancing on it. Please check the documentation for this here
http://docs.sun.com/app/docs/doc/819-2629/6n4tgd1uu?a=view
and here for cluster setup
http://docs.sun.com/app/docs/doc/819-2629/6n4tgd1ro?a=view
Why is the session replication feature not automatically enabling the reverse proxy feature? I guess because you could use another reverse proxy / load balancing product or hardware.
# 10
Sergey,Please check [url= http://blogs.sun.com/nsegura/entry/h2_session_replication_and_lightweight] this blog[/url] for more guidance on how to get session failover going with your app.Please feel free to post any comments
# 11
THANK YOU very much nsegura !!! I will try it at work tomorrow and see whether it works.
# 12
I keep getting this
[04/Jun/2007:15:14:14] warning ( 2988): REPL0037: Got exception while connection to backup server [REPL0055: Failed to establish a connection to any remote instance]
com.sun.web.replication.StoreException: REPL0055: Failed to establish a connection to any remote instance
at com.sun.web.replication.client.BackupClient.establishBackupConnection(BackupClient.java:613)
at com.sun.web.replication.client.BackupClient.checkClientTransports(BackupClient.java:693)
at com.sun.web.replication.client.BackupClient$ClientTransportWatchDog.run(BackupClient.java:763)
# 13
This after you took down one of the instances, right?
As I explained above, you will keep getting these warnings until at least two instances in the cluster are running. Because an instance will continually try to find a place where to backup its sessions.
If I understand correctly, you are expecting the following. You set up 2 instances, one in the admin server host, and another in a second host. You connect to the first instance -say primary , and everything works. Then you take down that instance, and expect the backup to take over.
But this is not the way it works. There is not primary instance serving requests, nor secondary backup instance waiting for the first one to fail to take over. There is a cluster, and instances in the cluster. All of them are primary, all of them server requests, round-robin. The reverse proxy decides which instance receives a request.
The fact that one instance is in the admin server has no implication on the working of the cluster. Also, other instances do not automatically take over failed instances. They only keep copies of the sessions of another instance. Again, it is the proxy server (or load balancing) that re routes request to another instance when the instance that was originally handling the request is not found by that reverse proxy. The that instance will realize that request was meant for another instance, will find that instance's session in the cluster, and serve the request.
So unless you configure a reverse proxy server, and have the client connect to that reverse proxy, and let the proxy server decide to which instance to send request, you can't have failover.
I am interested to know if you tried any of the scenarios, and if they worked or not, specially the one where you stop/start the instance.
# 14
for me session replication works for 1 out of 4 clients. Looks like the first client that connects will get the failover working fine, the rest of the clients get dropped
# 15
I tried this with two clients, and it does work for me. Here is what I did
a ) Use firefox to sent a request, noted down the instance that received the request
b ) Use netscape to sent a second request, which was answered by another instance. Closed netscape, sent another request. Repeated until the request was answered by the same instance as in (a)
Stopped the instance in (a), tried continuing the session in firefox, it recover correctly. Tried continuing the session in netscape, it worked correctly.
How are you emulating the different clients, and to which instance is each one of the initially connecting?