2 Replies Latest reply on Feb 15, 2013 5:09 PM by kantor

    Possible bug with JBoss Messaging on bisocket transport

    husgaard

      Hi,

       

      I am using JBoss Messaging 1.4.7.GA with the bisocket transport from JBoss Remoting 2.5.3.SP1 connecting to remote clients.

       

      Ever since I started using this combination I have seen some stability issues on large and busy production systems that I have been unable to reproduce in a test environment. I have mostly seen these issues when the systems have been very busy, and I have noticed that these issues become a lot worse when the network is not good (high packet loss and/or parts of the network disconnecting at times).

       

      Today I think I have found the cause of these issues, and it looks like a bug in the interface between JBoss Messaging and JBoss Remoting. I would like some of you JBoss Messaging experts here to comment on my observations before I open a JIRA ticket.

       

      My findings starts with two thread dumps obtained at a production server experiencing these issues. The thread dumps were taken 8 seconds apart, and both show the same stack trace:

       

      "WorkManager(2)-76" daemon prio=10 tid=0x00002aaae483c000 nid=0x3d40 in Object.wait() [0x000000004770a000]

         java.lang.Thread.State: TIMED_WAITING (on object monitor)

          at java.lang.Object.wait(Native Method)

          at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.createSocket(BisocketClientInvoker.java:528)

          - locked <0x000000070f9cb8f8> (a java.util.HashSet)

          at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.getConnection(MicroSocketClientInvoker.java:1165)

          at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:816)

          at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.transport(BisocketClientInvoker.java:461)

          at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:167)

          at org.jboss.remoting.Client.invoke(Client.java:2034)

          at org.jboss.remoting.Client.invoke(Client.java:877)

          at org.jboss.remoting.Client.invokeOneway(Client.java:926) - not client side, means we may block instead of executing in a new thread

          at org.jboss.remoting.callback.ServerInvokerCallbackHandler.handleCallback(ServerInvokerCallbackHandler.java:835)

          at org.jboss.remoting.callback.ServerInvokerCallbackHandler.handleCallbackOneway(ServerInvokerCallbackHandler.java:708) - serverSide=false

          at org.jboss.jms.server.endpoint.ServerSessionEndpoint.performDelivery(ServerSessionEndpoint.java:1467)

          at org.jboss.jms.server.endpoint.ServerSessionEndpoint.handleDelivery(ServerSessionEndpoint.java:1379)

          - locked <0x00000006a8c8dfd0> (a org.jboss.jms.server.endpoint.ServerSessionEndpoint)

          at org.jboss.jms.server.endpoint.ServerConsumerEndpoint.handle(ServerConsumerEndpoint.java:328)

          - locked <0x0000000695caa390> (a java.lang.Object)

          at org.jboss.messaging.core.impl.RoundRobinDistributor.handle(RoundRobinDistributor.java:119)

          at org.jboss.messaging.core.impl.MessagingQueue$DistributorWrapper.handle(MessagingQueue.java:590)

          at org.jboss.messaging.core.impl.ClusterRoundRobinDistributor.handle(ClusterRoundRobinDistributor.java:79)

          at org.jboss.messaging.core.impl.ChannelSupport.deliverInternal(ChannelSupport.java:665)

          at org.jboss.messaging.core.impl.MessagingQueue.deliverInternal(MessagingQueue.java:513)

          at org.jboss.messaging.core.impl.ChannelSupport.handle(ChannelSupport.java:246)

          - locked <0x0000000695caa498> (a java.lang.Object)

          at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.routeInternal(MessagingPostOffice.java:2504)

          at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.route(MessagingPostOffice.java:580)

          at org.jboss.jms.server.endpoint.ServerConnectionEndpoint.sendMessage(ServerConnectionEndpoint.java:779)

      (snip)

       

      This thread is trying to deliver a JMS message and is waiting in BisocketClientInvoker.createSocket() for a remote client to open a connection back to the server. At the same time the thread holds a read lock in MessagingPostOffice. This means that any work in MessagingPostOffice that needs a write lock is blocked.

       

      The problem in this case is that the remote client is no longer connected to the network, so it will never connect back to the server. The wait in BisocketClientInvoker.createSocket() will time out after some time. There is a retry loop in BisocketClientInvoker.createSocket() which will retry a few times, but eventually the call will fail and the read lock in MessagingPostOffice is released. If I read the code correctly the read lock is held for 60 seconds, if the JBoss Remoting settings are default.

       

      Having the MessagingPostOffice lock exposed to client communication problems in this way is not good, and probably not what you want.

       

      Looking at the way JBoss Messaging calls JBoss Remoting when delivering a message to a remote client, I see that you use one-way calls where the caller does not care about a response, and the call is executed in another thread. But in JBoss Remoting there are two way of doing this:

      1. Starting a thread on the caller side to handle the call. This way the new thread will take care of any communication problems, and the calling thread can return immediately so the read lock in MessagingPostOffice is immediately released.
      2. Starting a new thread on the remote side to handle the call. This way the calling thread has to take care of any communication problems, and only when the invocation has been delivered to the remote side will the remote side start a new thread to handle the call.

       

      Unfortunately it looks like JBoss Messaging is using the last way of doing one-way calls.

       

      So my question is: Isn't this a bug? Shouldn't we use the first way of doing one-way calls instead?

        • 1. Re: Possible bug with JBoss Messaging on bisocket transport
          gaohoward

          I don't think it is a bug. To me either method of the two for doing one-ways has its own benefits and drawbacks. For this specific case it seems picking the other method can solve the problem but I'm not sure whehter this can cause other issues or not.

           

          I believe the timeout can be reduced by tuning the remoting configuration, rather than changing the code. As you may know JBM is now in maintenance mode only, we won't do anything to it unless critical bug fixes.

           

          Howard

          • 2. Re: Possible bug with JBoss Messaging on bisocket transport
            kantor

            Hi,

             

            we are facing  exaclty the same issue in JBoss EAP 5.1 with JBoss Remoting 2.5.3SP1.patch01

            Thread dumps show many threads in BLOCKED state all waiting for an object that is locked by a thread in TIMED_WAITING state. So the will remain in that state until that thread times out:

             

            "pmPublisherService-daemon-9" daemon prio=10 tid=0x00007fef4962c800 nid=0x3916 in Object.wait() [0x00007fef4c724000]

               java.lang.Thread.State: TIMED_WAITING (on object monitor)

                 at java.lang.Object.wait(Native Method)

                 - waiting on <0x00000007122bc720> (a java.util.HashSet)

                 at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.createSocket(BisocketClientInvoker.java:528)

                 - locked <0x00000007122bc720> (a java.util.HashSet)

                 at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.getConnection(MicroSocketClientInvoker.java:1165)

                 at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:816)

                 at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.transport(BisocketClientInvoker.java:461)

                 at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:167)

                   <…>

                 - locked <0x000000072c411f78> (a java.lang.Object)

                 at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.routeInternal(MessagingPostOffice.java:2504)

                 at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.route(MessagingPostOffice.java:580)

             

             

             

            "pmPublisherService-daemon-8" daemon prio=10 tid=0x0000000040d07800 nid=0x3915 waiting for monitor entry [0x00007fef4c765000]

               java.lang.Thread.State: BLOCKED (on object monitor)

                 at org.jboss.messaging.core.impl.ChannelSupport.handle(ChannelSupport.java:242)

                 - waiting to lock <0x000000072c411f78> (a java.lang.Object)

                 at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.routeInternal(MessagingPostOffice.java:2504)

             

            But it hangs very long: longer as timeout from in remoting-bisocket-service.xml:

             

            <attribute name="timeout" isParam="true">30000</attribute> (remoting-bisocket-service.xml)

             

            Any information you can provide me would be greatly appreciated!

             

            Best regards,

            Vadim