NIO and CORBA

I recently update a large multi client COBRA application from Visibroker to Sun COBRA Object Broker.

As you know Sun's ORB uses NIO for its communication from client to server. A problem arises if one client should drop communication and the NIO Selector routine

locks up all the other clients. There must be a timeout feature in the COBRA/NIO logic that allows me to immediately drop the offending client and continue to service the remaining clients. I never encountered these problems when each CORBA object/callback was handled as a seperate thread.

This problem is very distracting and should be easy to fix.Any ideas?

I am using java 1.6+

Thanks for any replies..

[707 byte] By [wigoea] at [2007-11-27 5:55:20]
# 1

> As you know Sun's ORB uses NIO for its communication

> from client to server.

> A problem arises if one client should drop communication and the NIO Selector routine locks up all the other clients.

If we're talking about the ORB in the JDK, it selects with a timeout which defaults to 60 seconds and which never seems to be modified. I don't know how any competently written NIO selector routine could 'lock up all the other clients' unless it misuses OP_WRITE, and this one doesn't use OP_WRITE at all so that's not it. I've never seen this problem. Possiby the problem is in your application.

If you're talking about some other Sun ORB you're asking in the wrong forum.

ejpa at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...
# 2

Entirely probable that there is an error in my code. But the code successfully sends 40000 messages to ten clients per hour and then when one client hangs (it is wireless) during an eight hour day and the other nine processes then hang as well. In fact If I wait 2 minutes he system appears to remove the offending client (it's timeout apparently exceeded ) then other nine clients carry on. Unfortunately I am in a real time trading environment and I cannot afford this luxury.

I am suspicious of the NIO in SUN COrbas when I see messages like

http://forum

I also see similar NIO issues located in forums other that SUN's.I have had success in implementing the solution in the mention above thread. I fact I extended the solution to such and extends that I can actually pinpoint which socket/channel failure hangs the other sockets.

wigoea at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...
# 3

After a quick look at JDK 6 corba source code (JDK 6 src.zip file), I don't see where the Selector Thread can be blocked in the ORB's default configuration.

There's an ORB configuration option where OP_ACCEPT events could be processed by the Selector Thread. But, looking at the JDK 6 source code it appears that by default it is configured to handle OP_ACCEPT in a Worker Thread.

The following command line property can be used to explicitly force OP_ACCEPT events to be handled in the Selector Thread:

-Dcom.sun.CORBA.transport.ORBAcceptorSocketUseWorkerThreadForEvent=true

Another property that is worthwhile setting to see what's happening is:

-Dcom.sun.CORBA.ORBDebug="transport"

* I do not recall if the "transport" needs quotes around it or not.

The latter switch will enable transport debugging and show whether OP_ACCEPT events are happening in the Selector Thread or Worker Thread. It would also likely tell you where things are getting hung up.

Also, what operating system are you running on? I have seen quite a few nasty issues on Linux with NIO.From my experience, Solaris tends to work the best.

Could you also post the full java command line you are using including any ORB properties ?

thanks,

charlie ...

huntcha at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...
# 4

Sorry, I made an error in the previous post.

The following:

The following command line property can be used to explicitly force OP_ACCEPT events to be handled in the Selector Thread:

-Dcom.sun.CORBA.transport.ORBAcceptorSocketUseWorkerThreadForEvent=true

Should have been:

The following command line property can be used to explicitly force OP_ACCEPT events to be handled in a Worker Thread:

-Dcom.sun.CORBA.transport.ORBAcceptorSocketUseWorkerThreadForEvent=true

sorry for the confusion,

charlie ...

huntcha at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...
# 5
Thanks I will give it a shot.
wigoea at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...
# 6
The servers are on Fedora 6.0
wigoea at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...
# 7

my orb properties(internally set) are the following.

You will note that I installed my own Socket factory and etc.

These sockets allow me to monitor the socket activity(ports and byte read and write)

in real time(ie socket listeners);

props.put("org.omg.CORBA.ORBInitialPort", "1050");

props.put(com.sun.corba.se.impl.orbutil.ORBConstants.USE_NIO_SELECT_TO_WAIT_PROPERTY,"false");

props.put(com.sun.corba.se.impl.orbutil.ORBConstants.CONNECTION_SOCKET_TYPE_PROPERTY,com.sun.corba.se.impl.orbutil.ORBConstants.SOCKET);

props.put(com.sun.corba.se.impl.orbutil.ORBConstants.ACCEPTOR_SOCKET_TYPE_PROPERTY,com.sun.corba.se.impl.orbutil.ORBConstants.SOCKET);

props.put(com.sun.corba.se.impl.orbutil.ORBConstants.SOCKET_FACTORY_CLASS_PROPERTY, riskmanager.utils.OurSocketFactory.class.getName());

props.put(com.sun.corba.se.impl.orbutil.ORBConstants.IOR_TO_SOCKET_INFO_CLASS_PROPERTY,riskmanager.utils.OurIORTOSocketInfo.class.getName());

props.put("com.sun.CORBA.transport.ORBSocketFactoryClass", riskmanager.utils.OurSocketFactory.class.getName());

props.put("com.sun.CORBA.transport.ORBConnectionSocketType","SOCKET");

props.put("com.sun.CORBA.transport.ORBAcceptorSocketUseWorkerThreadForEvent","true");

wigoea at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...
# 8

> my orb properties(internally set) are the following.

> You will note that I installed my own Socket factory

> and etc.

> These sockets allow me to monitor the socket

> activity(ports and byte read and write)

> in real time(ie socket listeners);

>

> >

>props.put("org.omg.CORBA.ORBInitialPort", "1050");

>

> put(com.sun.corba.se.impl.orbutil.ORBConstants.USE_NIO

> _SELECT_TO_WAIT_PROPERTY,"false");

>

> props.put(com.sun.corba.se.impl.orbutil.ORBConstants.

> ONNECTION_SOCKET_TYPE_PROPERTY,com.sun.corba.se.impl.o

> rbutil.ORBConstants.SOCKET);

>

> rops.put(com.sun.corba.se.impl.orbutil.ORBConstants.AC

> CEPTOR_SOCKET_TYPE_PROPERTY,com.sun.corba.se.impl.orbu

> til.ORBConstants.SOCKET);

>

> rops.put(com.sun.corba.se.impl.orbutil.ORBConstants.SO

> CKET_FACTORY_CLASS_PROPERTY,

> riskmanager.utils.OurSocketFactory.class.getName());

>

> rops.put(com.sun.corba.se.impl.orbutil.ORBConstants.IO

> R_TO_SOCKET_INFO_CLASS_PROPERTY,riskmanager.utils.OurI

> ORTOSocketInfo.class.getName());

>

> s.put("com.sun.CORBA.transport.ORBSocketFactoryClass",

> riskmanager.utils.OurSocketFactory.class.getName());

>

> rops.put("com.sun.CORBA.transport.ORBConnectionSocketT

> ype","SOCKET");

>

> rops.put("com.sun.CORBA.transport.ORBAcceptorSocketUse

> WorkerThreadForEvent","true");

>

>

I did not realize you were using your own socket factory.That may make quite a difference here.

You're using your own socket factory, which is ok. You've disabled non-blocking NIO by setting USE_NIO_SELECT_TO_WAIT_PROPERTY to false. Instead you'll be using reader & listener threads.

The setting of ORBAcceptorSocketUseWorkerThreadForEvent to 'true' will have no effect since you've disabled non-blocking NIO.

Couple things I would suggest you do:

1.) Update your props.put("com.sun.CORBA.transport.ORBConnectionSocketType","SOCKET");

to:

props.put("com.sun.CORBA.transport.ORBConnectionSocketType", com.sun.corba.se.impl.orbutil.ORBConstants.SOCKET);

2.) And add the transport debug property I posted previously. That transport debug flag will tell you an awful lot about what you have setup for a configuration and whether your socket factory is being used along with reader & listener threads.

huntcha at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...
# 9

Charlie,

Thanks for your help on this problem it has been dogging me for months,

Adding the line

-Dcom.sun.CORBA.transport.ORBAcceptorSocketUseWorkerThreadForEvent=true

inside my Properties for the ORB was a hugh success. Very few socket drops from our wireless traders on the exchange floor AND and when drops did occur NONE of the other processes were interrupted. All five server process are running uninterrupted on a very busy market day. This has saved a lot of frustration.

Incidentally the Socket factory overrides merely extend the default SocketFactory. I merely added count variable for Sockets created and closed. I also over rode the getInputStream and getOutputStream so I could count the bytes going back and forth.

Am I correct in assuming that NIO has problems using Fedora 6.0 as the programs work perfectly using Worker threads?

wigoea at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...
# 10
We tend to see more issue with Linux in general with non-blocking NIO than other platforms.Glad to hear you have stabilized things.
huntcha at 2007-7-12 16:23:55 > top of Java-index,Core,Core APIs...