Periodic locking of client/consumers - SMQ 3.5SP2

We are using SMQ 3.5 SP2 on Solaris (Platform Edition)

The JMS Broker is used extensively and we have been using it with great success - apart from we have one locking issue.

Basically we have:

- a single topic

- many persistent subscribers (running in their own thread - connecting via their own transacted session)

- each subscriber is making use of their own selector

- we are running the broker in a standalone mode (and not in a app server)

We are pretty confident that we are not running into a situation where the same session is being access by 2 different threads (something that the JMS Spec says can cause undefined behaviour or locking).

We have also checked for deadlocks and still no luck. We have increased the heap size and have been logging the memory use and we believe that to be ok too.

The conditions that seem to cause a problem (and cause the consumers to lockup) is when the following scenario shows up in the operational service:

- we are rolling back transactions on at least 1 subscriber... Due to the way the application works, the rolling back operation occurs at fixed times and can repeat . Therefore a number of subscribers (using selectors) could be rolling back to the same topic at approx the same time. We are using the receive and receive(wait) methods in the consumers

- The subscribers whose transactions are being rolled back generally have at least a few hundred messages in the topic (if < 100 or so the problem doesn't seem to occur).

Is there anything in 3.5SP2 that was fixed in 3.6 that could explain this issue.. Either memory related or session/thread related? We are considering the upgrade to see if it fixes the problem.

Also - are there any capabilites in the server to check if 2 different threads are accessing the same session? (even though we believe this is *not* the case).

[1921 byte] By [jbeech] at [2007-11-26 6:20:46]
# 1

Hi,

1. When you say "consumers lockup", do you mean that consumers stop receiving messages? Did MessageConsumer.receive (timetime) returns with null (and you know there are messages in the destination for the consumer)?

2. Unfortunately, there is no feasible way to check if there are two threads attempt to operate on the same session (simultaneously). But in MQ3.6, there are improvements that could trigger S1MQ to throw a JMSException to an application if it detects the concurrent session usage scenario.

chiamingyang at 2007-7-6 14:05:47 > top of Java-index,Application & Integration Servers,Sun Java System Message Queue...
# 2

Yes, the consumers seem to stop receiving messages. The blocking receive(time) call is returning null and there are definitely messages in the topic/destination) - I know this to be the case because the threads make a receive(120000) call and make a log entry when nothing is returned ... I see my log entries every 2 minutes, just no useful data coming in for it to do anything with.. I know there is data available for the consumer in question because it shows in the imqadmin tool and when the client is reset, the data is handled as it should be. Plus I know that I placed data into the destination with the correct properties for the selector on the consumer to handle it.

At present on 3.5SP2 I'm getting no useful messages or exceptions. The broker itself isn't locking up - just the client application.

The service runs for months without any problem.. when a consumer goes into the rollback cycle, it could run for 24 hours or more before locking. Sometimes it can be less, sometimes it never occurs. I feel though that when there is more data running through the broker it occurs more frequently.

Perhaps 3.6 might provide more useful information via the exception you mentioned.

Do you have any other recommendations - apart from an upgrade to 3.6?

jbeech at 2007-7-6 14:05:47 > top of Java-index,Application & Integration Servers,Sun Java System Message Queue...
# 3
You mentioned that 3.6 "could" trigger S1MQ to throw a JMSException to an application if it detects the concurrent session usage scenario.Do you know the exact format of the Exception or message I would get.. and you say "could". Is there no guarantee that it will get thrown?
jbeech at 2007-7-6 14:05:47 > top of Java-index,Application & Integration Servers,Sun Java System Message Queue...
# 4

You may have two threads operate on the same session without getting an exception as long as they do not access the session concurrently.

MQ (3.6) throws a javax.jms.IllegalStateException if it detects concurrent operations on the same session (such as calling rollback and commit simultaneously).

The error code and error message defined in this exception are

error code = C4055

error message = Resource in conflict. Concurrent operations on a session.

An application should not depend on MQ (3.6) to throw an IllegalStateException to detect a concurrent session usage condition.There is no guarantee that it will be thrown. Only those operations that compromise a session's integrity are checked.

chiamingyang at 2007-7-6 14:05:47 > top of Java-index,Application & Integration Servers,Sun Java System Message Queue...
# 5

Since I last posted we have introduced more logging into the system and we were able to see the point at which the consumer.receive method stopped handing back messages. If we commit the transaction everytime, we *never* have a problem, only when we rollback. Surely if we had threading issues the commit would cause the same problem? That is not the case. If we rollback less frequently (we applied a patch to try this), then it looks like the system takes longer to fail - infact it has not failed since. That just means the problem has been hidden because rollbacks are occuring less frequently.

In the log you can see the rollback occuring at 22:23:45 and then again every 5 seconds at 22:23:50, 22:23:55, 22:24:00. The next rollback doesn't occur at 22:24:24 (this is still expected behaviour - we just happen to do a http test at this point that takes an extra 20 seconds to complete). Again at 22:24:29, 22:24:35, 22:24:40 we rollback as expected. Remember this rollback process had been occuring without problem for 5 or 6 hours - so in general its working fine.

The interesting time is then at 22:24:45... this time when we call the synchronous receive(120000) call method, we don't get any message back. Infact the call doesn't return until the timeout occurs 2 minutes later. At this point all of our other consumers run into the same problem. Consumers connected to other topics also seem to have the same problem. At this point we have to reset the client.

We know for sure that there is data available in the topic for this consumer with the given selector (you can see this in the previous rollback sections where we are getting the first message each time).

04/19/2006 22:23:45 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> end

04/19/2006 22:23:45 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:23:45 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:23:45 GMT T ExpiringConsumer::handleSinglesonMessages(1) >> begin

04/19/2006 22:23:45 GMT T VARConsumer::onMessagesisFailing still >> throw internal exception

04/19/2006 22:23:45 GMT T VARConsumer::onMessagesOuterException thrown

04/19/2006 22:23:45 GMT T ExpiringConsumer::runRollback begin

04/19/2006 22:23:45 GMT T ExpiringConsumer::runRollback complete

04/19/2006 22:23:45 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> begin

04/19/2006 22:23:50 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> end

04/19/2006 22:23:50 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:23:50 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:23:50 GMT T ExpiringConsumer::handleSinglesonMessages(1) >> begin

04/19/2006 22:23:50 GMT T VARConsumer::onMessagesisFailing still >> throw internal exception

04/19/2006 22:23:50 GMT T VARConsumer::onMessagesOuterException thrown

04/19/2006 22:23:50 GMT T ExpiringConsumer::runRollback begin

04/19/2006 22:23:50 GMT T ExpiringConsumer::runRollback complete

04/19/2006 22:23:50 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> begin

04/19/2006 22:23:55 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> end

04/19/2006 22:23:55 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:23:55 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:23:55 GMT T ExpiringConsumer::handleSinglesonMessages(1) >> begin

04/19/2006 22:23:55 GMT T VARConsumer::onMessagesisFailing still >> throw internal exception

04/19/2006 22:23:55 GMT T VARConsumer::onMessagesOuterException thrown

04/19/2006 22:23:55 GMT T ExpiringConsumer::runRollback begin

04/19/2006 22:23:55 GMT T ExpiringConsumer::runRollback complete

04/19/2006 22:23:55 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> begin

04/19/2006 22:24:00 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> end

04/19/2006 22:24:00 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:24:00 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:24:00 GMT T ExpiringConsumer::handleSinglesonMessages(1) >> begin

04/19/2006 22:24:00 GMT T VARConsumer::onMessagesisFailing still >> throw internal exception

04/19/2006 22:24:00 GMT T VARConsumer::onMessagesOuterException thrown

04/19/2006 22:24:00 GMT T ExpiringConsumer::runRollback begin

04/19/2006 22:24:00 GMT T ExpiringConsumer::runRollback complete

04/19/2006 22:24:00 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> begin

04/19/2006 22:24:05 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> end

04/19/2006 22:24:05 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:24:05 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:24:05 GMT T ExpiringConsumer::handleSinglesonMessages(1) >> begin

04/19/2006 22:24:07 GMT T HttpConnectionDelivery to: Http://xxx.xxx.com

04/19/2006 22:24:24 GMT W VARConsumer Ping failed to: Http://xxx.xxx.com Error on line -1: Premature end of file.

04/19/2006 22:24:24 GMT T VARConsumer::onMessagesisFailing still >> throw internal exception

04/19/2006 22:24:24 GMT T VARConsumer::onMessagesOuterException thrown

04/19/2006 22:24:24 GMT T ExpiringConsumer::runRollback begin

04/19/2006 22:24:24 GMT T ExpiringConsumer::runRollback complete

04/19/2006 22:24:24 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> begin

04/19/2006 22:24:29 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> end

04/19/2006 22:24:29 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:24:29 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:24:29 GMT T ExpiringConsumer::handleSinglesonMessages(1) >> begin

04/19/2006 22:24:29 GMT T VARConsumer::onMessagesisFailing still >> throw internal exception

04/19/2006 22:24:29 GMT T VARConsumer::onMessagesOuterException thrown

04/19/2006 22:24:29 GMT T ExpiringConsumer::runRollback begin

04/19/2006 22:24:29 GMT T ExpiringConsumer::runRollback complete

04/19/2006 22:24:29 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> begin

04/19/2006 22:24:35 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> end

04/19/2006 22:24:35 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:24:35 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:24:35 GMT T ExpiringConsumer::handleSinglesonMessages(1) >> begin

04/19/2006 22:24:35 GMT T VARConsumer::onMessagesisFailing still >> throw internal exception

04/19/2006 22:24:35 GMT T VARConsumer::onMessagesOuterException thrown

04/19/2006 22:24:35 GMT T ExpiringConsumer::runRollback begin

04/19/2006 22:24:35 GMT T ExpiringConsumer::runRollback complete

04/19/2006 22:24:35 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> begin

04/19/2006 22:24:40 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> end

04/19/2006 22:24:40 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:24:40 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:24:40 GMT T ExpiringConsumer::handleSinglesonMessages(1) >> begin

04/19/2006 22:24:40 GMT T VARConsumer::onMessagesisFailing still >> throw internal exception

04/19/2006 22:24:40 GMT T VARConsumer::onMessagesOuterException thrown

04/19/2006 22:24:40 GMT T ExpiringConsumer::runRollback begin

04/19/2006 22:24:40 GMT T ExpiringConsumer::runRollback complete

04/19/2006 22:24:40 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> begin

04/19/2006 22:24:45 GMT T ExpiringConsumer::rungotoSleep(pauseError) >> end

04/19/2006 22:24:45 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:26:45 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:26:45 GMT T ExpiringConsumer::handleSinglesNothingToDo() >> begin

04/19/2006 22:26:46 GMT T HttpConnectionDelivery to: Http://xxx.xxx.com

04/19/2006 22:27:03 GMT W VARConsumer Ping failed to: Http://xxx.xxx.com Error on line -1: Premature end of file.

04/19/2006 22:27:03 GMT T ExpiringConsumer::handleSinglesNothingToDo() >> end

04/19/2006 22:27:03 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:29:03 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:29:03 GMT T ExpiringConsumer::handleSinglesNothingToDo() >> begin

04/19/2006 22:29:04 GMT T HttpConnectionDelivery to: Http://xxx.xxx.com

04/19/2006 22:29:21 GMT W VARConsumer Ping failed to: Http://xxx.xxx.com Error on line -1: Premature end of file.

04/19/2006 22:29:21 GMT T ExpiringConsumer::handleSinglesNothingToDo() >> end

04/19/2006 22:29:21 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

04/19/2006 22:31:21 GMT T ExpiringConsumer::handleSinglesconsumer.receive >> end

04/19/2006 22:31:21 GMT T ExpiringConsumer::handleSinglesNothingToDo() >> begin

04/19/2006 22:31:22 GMT T HttpConnectionDelivery to: xxx.xxx.com

04/19/2006 22:31:39 GMT W VARConsumer Ping failed to: Http://xxx.xxx.com Error on line -1: Premature end of file.

04/19/2006 22:31:39 GMT T ExpiringConsumer::handleSinglesNothingToDo() >> end

04/19/2006 22:31:39 GMT T ExpiringConsumer::handleSinglesconsumer.receive(120000) >> begin

jbeech at 2007-7-6 14:05:47 > top of Java-index,Application & Integration Servers,Sun Java System Message Queue...