Session Replication Problem

I was trying to make session replication work. But the backup machine keeps giving me:

REPL0037: Got exceptionwhile connection to backup server

right after it it says

REPL0076:Established connection to Backup Instance

After that I am trying to take the main instance down, hoping that backup will kick in. An it doesn't. Any ideas how to make this work?

[419 byte] By [SergeySa] at [2007-11-27 5:17:24]
# 1
Sergey,What is the setup you have? How many nodes are in the cluster? are there other errors reported to the log file?
nseguraa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 2

I am just testing so the configuration is like that:

-one admin - it has an instance of an app

- one backup node - it has the same exact instance

The idea is to make the session replication work, so as soon as I will turn of the instance let's say on admin, the back up node node will take over.

There are some errors like something is out of sequence, and whole bunch of RMI exceptions. Do you happen to know any solid example of how it is done properly, except for the one in the admin guide?

Thanks.

SergeySa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 3

With a two nodes configuration, having one node down is an abnormal situation: The surviving node will always complain of its inability to save sessions to a backup. In the current schema sessions are saved in the other node. Having one node only running means there is no fail over.

Still, when going from two nodes to one because of failure, the surviving node should be able to recover the sessions from the failed node from itself. That should work. After that however, sessions wont be replicated until the failed node recovers.

I will try to get a blog on how to get a simple app working in a two node configuration with session replication enable, and post a link to it here.

nseguraa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 4

I am not really sure why is that an abnormal situation? My understanding of that was if let's say there is a need for a hardware upgrade or any other procedure when one of the nodes will be down. Another one should take the current and potential sessions and work with them and suppose to be transparent to the client. Am I wrong?

SergeySa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 5

SergeyS, yes, it should be transparent to the client. The situation is "abnormal" in that a node is offline, so we'd expect the server to log those error messages. (I'd call it "degraded", not "abnormal".)

What, exactly, is the problem you're seeing? What device (e.g. load balancer) are you using to direct connections to the nodes in your cluster? Is that device correctly failing over connections?

elvinga at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 6
I was using the reverse proxy that comes with the server - straight out of the manual configuration. But I also was under impression that once the nodes are in cluster, the server can take care of rerouting traffic by itself as long as the admin node is up.
SergeySa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 7

The admin node is unrelated to reverse proxy or session replication. It's only used for deploying configuration changes made by the administrator (you) using the admin GUI or CLI. It's possible to use the reverse proxy and session replication without an admin node.

Again, what, exactly, is the problem you're seeing?

Do you have two instances configured? Is the reverse proxy configured to send requests to both instances? Is that working?

elvinga at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 8

Let me try to explain what is needed to be done and a guess we can deduce the flaw.

I nave: couple of web sites that need to be migrated to SJWS 7.0.

Need to have: session replication

I did everything that the manual (very poorly written) said. It does create the two copies on the admin (done by me) and on the node that is within the cluster with the admin. Also added the session replication in Java Tab. The serilized stuff is being created on the node, but when one of them is turned off , session is lost as well

SergeySa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 9

I see you don't mention creating a reverse proxy for your cluster, If so what you currently have is basic session replication. What you probably want is fail over. There is a small difference between those two, and that is where the misunderstanding probably comes from.

Try the following scenario: simply stop and start the server instance that has been handling the requests. It should be able to keep the same sessions after the restart -provided all information in the session is serializable. In this case the same instance is capable of recovering its sessions from the backup. This is what basic session replication is.

Now what you probably want is session failover, the ability of the backup instance to handle requests as if it were the failed instance. In order for that to work, you need some type of reverse proxy/load balancing of the requests. This is because you need to make transparent to the clients that the requests are being served by a different server.

Sun web server provides an in built reverse proxy feature with load balancing on it. Please check the documentation for this here

http://docs.sun.com/app/docs/doc/819-2629/6n4tgd1uu?a=view

and here for cluster setup

http://docs.sun.com/app/docs/doc/819-2629/6n4tgd1ro?a=view

Why is the session replication feature not automatically enabling the reverse proxy feature? I guess because you could use another reverse proxy / load balancing product or hardware.

nseguraa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 10
Sergey,Please check [url= http://blogs.sun.com/nsegura/entry/h2_session_replication_and_lightweight] this blog[/url] for more guidance on how to get session failover going with your app.Please feel free to post any comments
nseguraa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 11
THANK YOU very much nsegura !!! I will try it at work tomorrow and see whether it works.
SergeySa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 12

I keep getting this

[04/Jun/2007:15:14:14] warning ( 2988): REPL0037: Got exception while connection to backup server [REPL0055: Failed to establish a connection to any remote instance]

com.sun.web.replication.StoreException: REPL0055: Failed to establish a connection to any remote instance

at com.sun.web.replication.client.BackupClient.establishBackupConnection(BackupClient.java:613)

at com.sun.web.replication.client.BackupClient.checkClientTransports(BackupClient.java:693)

at com.sun.web.replication.client.BackupClient$ClientTransportWatchDog.run(BackupClient.java:763)

SergeySa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 13

This after you took down one of the instances, right?

As I explained above, you will keep getting these warnings until at least two instances in the cluster are running. Because an instance will continually try to find a place where to backup its sessions.

If I understand correctly, you are expecting the following. You set up 2 instances, one in the admin server host, and another in a second host. You connect to the first instance -say primary , and everything works. Then you take down that instance, and expect the backup to take over.

But this is not the way it works. There is not primary instance serving requests, nor secondary backup instance waiting for the first one to fail to take over. There is a cluster, and instances in the cluster. All of them are primary, all of them server requests, round-robin. The reverse proxy decides which instance receives a request.

The fact that one instance is in the admin server has no implication on the working of the cluster. Also, other instances do not automatically take over failed instances. They only keep copies of the sessions of another instance. Again, it is the proxy server (or load balancing) that re routes request to another instance when the instance that was originally handling the request is not found by that reverse proxy. The that instance will realize that request was meant for another instance, will find that instance's session in the cluster, and serve the request.

So unless you configure a reverse proxy server, and have the client connect to that reverse proxy, and let the proxy server decide to which instance to send request, you can't have failover.

I am interested to know if you tried any of the scenarios, and if they worked or not, specially the one where you stop/start the instance.

nseguraa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 14
for me session replication works for 1 out of 4 clients. Looks like the first client that connects will get the failover working fine, the rest of the clients get dropped
SergeySa at 2007-7-12 10:40:15 > top of Java-index,Web & Directory Servers,Web Servers...
# 15

I tried this with two clients, and it does work for me. Here is what I did

a ) Use firefox to sent a request, noted down the instance that received the request

b ) Use netscape to sent a second request, which was answered by another instance. Closed netscape, sent another request. Repeated until the request was answered by the same instance as in (a)

Stopped the instance in (a), tried continuing the session in firefox, it recover correctly. Tried continuing the session in netscape, it worked correctly.

How are you emulating the different clients, and to which instance is each one of the initially connecting?

nseguraa at 2007-7-21 21:23:51 > top of Java-index,Web & Directory Servers,Web Servers...