Non-deterministic NoSuchObjectException
Hi,
In my source code I export a remote object using UnicastRemoteObject.export(remoteObject, rmiPort) and bind it to a RMI service name using Naming.rebind(serviceName, remoteObject).
About 40 client instances successfully use this exported RMI object (they are running on two different hosts). But now and then a client can initially use the exported RMI object but after some time (1 minute) an java.rmi.NoSuchObjectException is thrown when a method of the remote object is called. If the client is closed and another client instance is started everything is fine again.
The application exporting the remote object is not restarted or anything else in the meantime. In addition, I only unexport the remote object manually when the application is shut down.
Could it be be a Distributed Garbage Collection problem? What can be the possible reasons that a remote object gets unavailable just for one single client?
thanks for any help
# 1
Does the server hold a reference to the exported object anywhere? Like in a static variable?
# 2
Thanks for the quick reply.
I just keep the reference of the object implementing the Remote Interface. But I don't keep the RemoteStub-Object that is returned when exporting the Remote object.
Anyway, if you bind a remote object to a service name at the RMI-Registry, the registry has a reference to this remote stub (see O'Reilly Java RMI, chapter 16.2.5). So it should not be garbage collected (even if no client uses the remote object).
Or does it make a difference if you use UnicastRemoteObject.export() instead of extending the remote object from UnicastRemoteObject?
# 3
How about the server class itself? The whole JVM will exit if you dont hold a static reference to the server itself even if you have exported the object.
# 4
The server itself is also referenced by other objects (although there is no static reference). The strange thing is that other clients can successfully access the remote object, just one client cannot. So I don't think that there is problem with the exported remote object. I rather think that there is a problem in RMI client connection management (because a single client connection gets stale). But I don't know how the RMI runtime manages client connections and when they get closed.
# 5
Speaking for Sun's implementation, there is a pool of idle connections which live for 15 seconds and are then closed. Every time you make a call, if there is an idle connection to that target it is reused and then returned to the pool. If it stays there > 15 seconds it is closed.
But all this actually has nothing to do with NoSuchObjectException, which is an indication that the stub is stale or that by some other means it refers to a non-existent remote object ID, i.e. probably one that has been DGCd and then GCd locally and consequently unexported.
As it says in the RMI specification, if there are network segments (i.e. routers) between clients and servers, it is possible for DGC.dirty() calls from the client to fail due to a temporary condition at the router(s), and thefore for DGC at the server to kick in prematurely. If you are allocating per-client objects, e.g. sessions, to which there is only one remote reference, this might explain things.
ejpa at 2007-7-9 0:04:38 >

# 6
I have one main entry point (a remote object exported and bound to the registry) to my RMI application that all clients use. Every client then gets an own remote object instance from this main entry point. The problem is that the NoSuchObjectException is thrown when trying to access the main entry point (not the single remote object instances for the clients).
If the exported remote object gets DGCd then how is it possible that I can use the remote object again if I start a new client instance?
It seems as if the remote stub gets DGCd (although bound to a RMI service name). But after performing a new RMI service lookup (by starting a new client instance) a remote stub is available again. Does the RMI Registry implement a mechanism for generating new stub objects if none exists for an exported and bound remote object?
Info: I don't hold a reference to the exported stub object. The clients are running in a terminal server on a Windows 2000 machine.
# 7
> I have one main entry point (a remote object exported
> and bound to the registry) to my RMI application that
> all clients use. Every client then gets an own remote
> object instance from this main entry point. The
> problem is that the NoSuchObjectException is thrown
> when trying to access the main entry point (not the
> single remote object instances for the clients).
Ok, then you can recover from that by repeating the lookup and retrying using the new stub. If that fails, bail out.
> If the exported remote object gets DGCd then how is
> it possible that I can use the remote object again if
> I start a new client instance?
No idea.
> It seems as if the remote stub gets DGCd (although
> bound to a RMI service name).
Possible but unlikely given the Registry and the server JVM are coresident in the same host.
> But after performing a
> new RMI service lookup (by starting a new client
> instance) a remote stub is available again.
Exactly, see above.
> Does the RMI Registry implement a mechanism for generating new
> stub objects if none exists for an exported and bound
> remote object?
Definitely not.
> Info: I don't hold a reference to the exported stub object.
I don't know what the 'exported stub object' is. Do you mean that the server JVM doesn't hold local references to the per-client objects?
ejpa at 2007-7-9 0:04:38 >

# 8
> I don't know what the 'exported stub object' is. Do you mean that the server JVM doesn't hold local references to the per-client objects?
No, I meant that I don't hold a reference to the stub object that is returned when I export my remote object using UnicastRemoteObject.export(). A reference to this stub object should normally be kept by the RMI registry because the according remote object is bound to a RMI service name.
On client-side I will try to lookup the stub again if I get a NoSuchObjectException. On server-side I will keep a reference to stub object returned by UnicastRemoteObject.export(). I hope this solves the problem but I still wonder why the stub got stale.
# 9
> No, I meant that I don't hold a reference to the stub
> object that is returned when I export my remote
> object using UnicastRemoteObject.export(). A
> reference to this stub object should normally be kept
> by the RMI registry because the according remote
> object is bound to a RMI service name.
Correct. There is no necessity for the server to retain its own stubs.
> On client-side I will try to lookup the stub again if
> I get a NoSuchObjectException.
Excellent
> On server-side I will
> keep a reference to stub object returned by
> UnicastRemoteObject.export().
Unnecessary, see above, and no reason it will make any difference if you do.
> I still wonder why the stub got stale.
Me too but DGC is funny sometimes. Are there routers in your network?
ejpa at 2007-7-9 0:04:38 >

# 10
> Me too but DGC is funny sometimes. Are there routers in your network?
The client and server hosts are connected via a switch. There is of course a router in the network but the communication should not go through this router.
The terminal server (on which the clients are running) synchronizes the system clock 2 times a day (the change is about 20 seconds). Could this be a possible reason for the DGC to collect the exported object?
# 11
Don't think so. DGC runs at five-minute intervals by default, renewing 10-minute leases.
ejpa at 2007-7-9 0:04:38 >

# 12
If you are not holding a static reference then the server will exit at non-deterministic times based on handed out references and other internal and RMI references to the server.Best to get that static reference to the server set, then find something else to waste time on.
# 13
What do you mean by server? Do you mean the exported remote object, the returned RemoteStub object when exporting the object by calling UnicastRemoteObject.export(), or the RMI application as a whole?I'd love to waste my time on something else ;-)
# 14
The exported remote object.
ejpa at 2007-7-9 0:04:38 >

# 15
I don't see how a static reference to the exported remote object could solve my problem. After I have looked up the remote object again on client-side everything works fine. So the remote object is obviously still exported.
# 16
Then I can't explain it. NoSuchObjectException means that the ObjectId encoded in the stub and sent along with the marshalled call doesn't correspond to any currently exported remote object. This usually means that the stub is 'stale', i.e. that the corresponding remote object has been unexported, or possibly that the stub is left over from a previous 'life' of the server JVM. This in turn can be due to DGC.
However regardless of the cause, it is a fact that DGC can kick in prematurely on any RMI system, which means that any client can encounter this exception at any time. The solution is always to reacquire the stub.
So you already have a solution in the form of a piece of recovery code which should always be present.
ejpa at 2007-7-21 17:08:05 >

# 17
You need to hold reference to the RMI application as a whole and the only way to hold that is static. As for your exported objects, you need references to those too if you want the server to be in charge of their exported state. They do not have to be static. If not they will drop away when RMI decides for them to drop.
Sure it could be something with that client VM. But I think first you need that server static reference. Else you will be dealing with the whole server shutting down at will.
What is so special about that one computer?
# 18
> Sure it could be something with that client VM. But I think first you need that server static reference. Else you will be dealing with the whole server shutting down at will.
The server is not shutting down at will for sure.
> What is so special about that one computer?
Nothing. The problem occured on a terminal server and on a normal personal computer. No routers are between the RMI server and client hosts.
The only possible explaination that comes to my mind when going through all facts about my problem is, that it is a client problem: I think that the client had a stale stub from a previous RMI application run (i.e., in the meantime the RMI application was restartet). But I still can't explain this because stubs are removed from client cache when the RMI application is restarted or stopped. I think the most reasonable explaination is that there is a programming error when removing the stub from client cache (although I have not found one yet).
Anyway, thanks for all your help!
# 19
Maybe the Registry still had the old stub which the client got hold of while meanwhile the remote object was being restarted ...
ejpa at 2007-7-21 17:08:05 >

# 20
By accident I found the reason for my problem:
The client and server applications both have a watchdog thread that checks if the communication partner is still alive (in intervals of 5 seconds). There was a short network failure. During this network failure the server watchdog checked if the client was still alive. Because of the network failure the server closed the client connection. The nasty thing about the whole problem was that the client did not encounter the network failure. In consequence the client did not close its connection to the server. Then when the client tried to use the connection again the NoSuchObjectException occured because the stub was not valid anymore (the connection was already closed by the server).
In my current solution I always check if the RMI connection (i.e. the remote objects) is still alive before using it. If it is not alive anymore I simply lookup the stubs again.
# 21
I would get rid of the server side of this watchdog and just let DGC take care of it.
ejpa at 2007-7-21 17:08:05 >

# 22
I would get rid of the client side of this watch dog since it is of no use as it does not guarantee the connection will be available on the next call, even if that call is 1us later...
Do not try to preemptively deal with network losses because they can happen at any time. You have to deal with them when they happen. You can not deal with them before they happen.
> In my current solution I always check if the RMI
> connection (i.e. the remote objects) is still alive
> before using it. If it is not alive anymore I simply
> lookup the stubs again.
That is almost correct. In my solution I always check if the RMI connection is still alive as I use it. If its no alive I simply lookup the stubs again. If they do not return, then and only then, do I abort. To be fair, your solution will work. Its just that the extra check is of lesser value since the next call can still fail.
# 23
I agree with all that and I don't believe in pings (or isReachables()) in the slightest.
Any RMI client has to be able to deal with a NoSuchObjectException at any time. The solution is to retry after trying to reacquire the stub, by whatever means it was originally acquired, a limited number of times.
Assuming that retrying makes any sense at all, which it mightn't in some application contexts.
ejpa at 2007-7-21 17:08:05 >

# 24
Just to make things clear: I don't use the watchdogs to check if the RMI connections (or the stubs) are still alive. I use the watchdog on server-side to clean up lost client connections (if the client application crashed or something simular). On client-side I use a watchdog to inform the user that the server is not available anymore (server crashed, is being restarted, ..). In addition, the GUI of the client is locked in order that the user cannot perform any actions.
# 25
So on the serverside you are implementing Unreferenced and cleaning up when that is triggered? That is the proper way. I'm not sure if what you are calling the server 'watchdog' is making calls on the remote objects or how you are determining they are no longer in use!? Thats the job of the DGC, so I hope you are using its facilities to do this and not attempting to recreate it!?
On the client side its fine. But again, when you say 'watchdog' to me you are implying that you have written special diagnostic code to determine when a connection is no longer useable. But this would not be proper. You should be simply catching exceptions from valid purposeful method calls. If you are calling methods for no reason other than to see if the connection is there, then what will you do when the connection is no longer there whey you actually try to use the connection? You would have to do the same thing all over again.
So it seems a watchdog would be redundant on both the client side and the server side.
