RMI Connection Refused, but the socket is bound and listening?

Setting: I have two servers (primary and secondary), and numerous clients that connect to those servers (10-15). The two servers communicate with each other to know if/when the other goes down so as to resume the "primary" responsibility if necessary.

While the primary is trying to connect to the secondary, I get a connection exception indicating "Connection Refused". The odd thing, is that I can initiate a TCP connection directly to the RMI server from the "other" server (or at least its registry).

My current hypothesis is that it抯 connecting to the registry, but is unable to establish the secondary TCP connection for it抯 object communications (which runs on a different arbitrarily assigned port from the OS).

Interestingly, if you look at the 憀sof?output from the second server, it already has a number of connections to the primary server, indicating that it is at some level able to establish communications.

I did try setting -Djava.rmi.server.hostname=<server ip> for each host, but that didn抰 seem to do anything.

I抳e tried to provide as much information as I can below. Let me know if there抯 anything else I can provide.

Thoughts?

Thanks....

Info about server1

TCP Connection Initiation Test (to the registry)

dba@oocs01-ctl:/companyx/dfs/master1 > telnet oocs02-ctl.companyx.com 10112

Trying 192.168.111.150...

Connected to oocs02-ctl.companyx.com (192.168.111.150).

Escape character is'^]'.

^]

telnet> quit

Connection closed.

dba@oocs01-ctl:/companyx/dfs/master1 >

The Stack Trace:

2007-01-24 11:00:52:949:SEVERE:24:Async operation threw ([rmi://oocs02-ctl.companyx.com:10112/if1]) - Could not get connection - EXCEPTION: dfs.exceptions.DFSException:

dfs.master.MasterProxy.runOp(MasterProxy.java:174)

dfs.master.MasterProxy.getRole(MasterProxy.java:191)

dfs.master.MasterImp.checkMasterRoles(MasterImp.java:805)

dfs.master.MasterImp.run(MasterImp.java:1195)

java.lang.Thread.run(Thread.java:595)

Connection refused to host: 192.168.111.150; nested exception is:

java.net.ConnectException: Connection refused - CAUSED BY - EXCEPTION: java.rmi.ConnectException:

sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:574)

sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:185)

sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)

sun.rmi.server.UnicastRef.invoke(UnicastRef.java:94)

java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:179)

java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:132)

$Proxy0.getRole(Unknown Source)

dfs.master.commands.AsyncGetMasterRole.call(AsyncGetMasterRole.java:15)

dfs.master.commands.AsyncGetMasterRole.call(AsyncGetMasterRole.java:8)

java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)

java.util.concurrent.FutureTask.run(FutureTask.java:123)

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)

java.lang.Thread.run(Thread.java:595)

Connection refused - CAUSED BY - EXCEPTION: java.net.ConnectException:

java.net.PlainSocketImpl.socketConnect(Native Method)

java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)

java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)

java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)

java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)

java.net.Socket.connect(Socket.java:519)

java.net.Socket.connect(Socket.java:469)

java.net.Socket.<init>(Socket.java:366)

java.net.Socket.<init>(Socket.java:179)

sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)

sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)

sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:569)

sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:185)

sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)

sun.rmi.server.UnicastRef.invoke(UnicastRef.java:94)

java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:179)

java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:132)

$Proxy0.getRole(Unknown Source)

dfs.master.commands.AsyncGetMasterRole.call(AsyncGetMasterRole.java:15)

dfs.master.commands.AsyncGetMasterRole.call(AsyncGetMasterRole.java:8)

java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)

java.util.concurrent.FutureTask.run(FutureTask.java:123)

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)

java.lang.Thread.run(Thread.java:595)

On the second server

dba@oocs02-ctl:/companyx/dfs/master2 > netstat -an | grep 10112

tcp00 0.0.0.0:101120.0.0.0:*LISTEN

tcp00 192.168.111.150:10112192.168.111.149:53208ESTABLISHED

dba@oocs02-ctl:/companyx/dfs/master2 > /usr/sbin/lsof -p 6585 | grep oocs01

java6585 dba7u IPv4 1633986187TCP oocs02-ctl.companyx.com:60637->oocs01-ctl.companyx.com:25322 (ESTABLISHED)

java6585 dba31u IPv4 1634340305TCP oocs02-ctl.companyx.com:60634->oocs01-ctl.companyx.com:59265 (ESTABLISHED)

java6585 dba35u IPv4 1633986509TCP oocs02-ctl.companyx.com:10112->oocs01-ctl.companyx.com:53208 (ESTABLISHED)

java6585 dba39u IPv4 1633986862TCP oocs02-ctl.companyx.com:60679->oocs01-ctl.companyx.com:44487 (ESTABLISHED)

dba@oocs02-ctl:/companyx/dfs/master2 >

Log output on second server, it seems to be binding.

2007-01-24 10:30:48:957:INFO:10:Binding to:'rmi://oocs02-ctl.companyx.com:10112/if1'

[6221 byte] By [paradoxa] at [2007-11-26 16:07:47]
# 1

I can't see it from the logm but maybe it tries to connect to a different port (other than 10112, which I understand is your registry port).

From your log, it doesn't look like you're in the middle of a registry call but rather on the middle of an active call. It is possible that you have a "dead" object in the registry that was created by a VM that is dead now. So you're getting a host/port combination (from the registry) that is no longer valid and therefore you get the exception.

You can use -Dsun.rmi.transport.proxy.logLevel=VERBOSE to see what are all the ports your application connects to.

Genady

genadya at 2007-7-8 22:30:07 > top of Java-index,Core,Core APIs...