4 Replies Latest reply on Mar 6, 2012 7:07 AM by gaohoward

    Latest deadlock in trunk tests

    gaohoward

      Running tests on my local machine, found a deadlock reported in thread dump:

       

      Found one Java-level deadlock:

      =============================

      "Thread-4 (HornetQ-client-global-scheduled-threads-1332022244)":

        waiting to lock monitor 0x000000004143e690 (object 0x00000007d7edd1b0, a java.lang.Object),

        which is held by "Old I/O client worker ([id: 0x4a8a0eef, /127.0.0.1:56193 => localhost/127.0.0.1:5445])"

      "Old I/O client worker ([id: 0x4a8a0eef, /127.0.0.1:56193 => localhost/127.0.0.1:5445])":

        waiting to lock monitor 0x0000000040b780b0 (object 0x00000007d7d642c0, a org.hornetq.core.journal.impl.JournalImpl),

        which is held by "main"

      "main":

        waiting for ownable synchronizer 0x00000007d7d65078, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),

        which is held by "Old I/O client worker ([id: 0x4a8a0eef, /127.0.0.1:56193 => localhost/127.0.0.1:5445])"

       

      Java stack information for the threads listed above:

      ===================================================

      "Thread-4 (HornetQ-client-global-scheduled-threads-1332022244)":

          at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.flush(RemotingConnectionImpl.java:463)

          - waiting to lock <0x00000007d7edd1b0> (a java.lang.Object)

          at org.hornetq.core.client.impl.ClientSessionFactoryImpl$PingRunnable.send(ClientSessionFactoryImpl.java:1680)

          at org.hornetq.core.client.impl.ClientSessionFactoryImpl$PingRunnable.run(ClientSessionFactoryImpl.java:1664)

          - locked <0x00000007d7edd308> (a org.hornetq.core.client.impl.ClientSessionFactoryImpl$PingRunnable)

          at org.hornetq.core.client.impl.ClientSessionFactoryImpl$ActualScheduledPinger.run(ClientSessionFactoryImpl.java:1613)

          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

          at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

          at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)

          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)

          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

          at java.lang.Thread.run(Thread.java:662)

      "Old I/O client worker ([id: 0x4a8a0eef, /127.0.0.1:56193 => localhost/127.0.0.1:5445])":

          at org.hornetq.core.journal.impl.JournalImpl.loadInternalOnly(JournalImpl.java:1298)

          - waiting to lock <0x00000007d7d642c0> (a org.hornetq.core.journal.impl.JournalImpl)

          at org.hornetq.core.replication.ReplicationEndpoint.finishSynchronization(ReplicationEndpoint.java:433)

          at org.hornetq.core.replication.ReplicationEndpoint.handleStartReplicationSynchronization(ReplicationEndpoint.java:527)

          at org.hornetq.core.replication.ReplicationEndpoint.handlePacket(ReplicationEndpoint.java:210)

          at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:549)

          at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:566)

          - locked <0x00000007d7edd1b0> (a java.lang.Object)

          at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:527)

          at org.hornetq.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1578)

          at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:73)

          at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:100)

          at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)

          at org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendUpstream(StaticChannelPipeline.java:514)

          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:287)

          at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169)

          at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134)

          at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)

          at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)

          at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:357)

          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)

          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)

          at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:90)

          at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

          at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)

          at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:181)

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

          at java.lang.Thread.run(Thread.java:662)

      "main":

          at sun.misc.Unsafe.park(Native Method)

          - parking to wait for  <0x00000007d7d65078> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)

          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

          at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)

          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)

          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)

          at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)

          at org.hornetq.core.journal.impl.JournalImpl.stop(JournalImpl.java:2392)

          - locked <0x00000007d7d642c0> (a org.hornetq.core.journal.impl.JournalImpl)

          at org.hornetq.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:2193)

          - locked <0x00000007d7d62120> (a org.hornetq.core.persistence.impl.journal.JournalStorageManager)

          at org.hornetq.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:2172)

          at org.hornetq.core.replication.ReplicationEndpoint.stop(ReplicationEndpoint.java:338)

          - locked <0x00000007d79f4778> (a org.hornetq.core.replication.ReplicationEndpoint)

          at org.hornetq.core.server.impl.HornetQServerImpl.stop(HornetQServerImpl.java:582)

          - locked <0x00000007d752aab8> (a java.lang.Object)

          - locked <0x00000007d752a320> (a org.hornetq.core.server.impl.HornetQServerImpl)

          at org.hornetq.core.server.impl.HornetQServerImpl.stop(HornetQServerImpl.java:489)

          at org.hornetq.core.server.impl.HornetQServerImpl.stop(HornetQServerImpl.java:450)

          at org.hornetq.tests.util.UnitTestCase.stopComponent(UnitTestCase.java:1528)

          at org.hornetq.tests.util.UnitTestCase.tearDown(UnitTestCase.java:975)

          - locked <0x0000000787655500> (a java.util.ArrayList)

          at org.hornetq.tests.util.ServiceTestBase.tearDown(ServiceTestBase.java:86)

          at org.hornetq.tests.integration.cluster.distribution.ClusterTestBase.tearDown(ClusterTestBase.java:182)

          at junit.framework.TestCase.runBare(TestCase.java:136)

          at junit.framework.TestResult$1.protect(TestResult.java:106)

          at junit.framework.TestResult.runProtected(TestResult.java:124)

          at junit.framework.TestResult.run(TestResult.java:109)

          at junit.framework.TestCase.run(TestCase.java:120)

          at junit.framework.TestSuite.runTest(TestSuite.java:230)

          at junit.framework.TestSuite.run(TestSuite.java:225)

          at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)

          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

          at java.lang.reflect.Method.invoke(Method.java:597)

          at org.apache.maven.surefire.junit.JUnitTestSet.execute(JUnitTestSet.java:95)

          at org.apache.maven.surefire.junit.JUnit3Provider.executeTestSet(JUnit3Provider.java:121)

          at org.apache.maven.surefire.junit.JUnit3Provider.invoke(JUnit3Provider.java:98)

          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

          at java.lang.reflect.Method.invoke(Method.java:597)

          at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)

          at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)

          at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)

          at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)

          at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)

       

      Found 1 deadlock.

        • 1. Re: Latest deadlock in trunk tests
          gaohoward

          My understanding of the problem:

           

          The ReplicationEndpoint.stop() and the ReplicationEndpoint.handlePacket(packet) should be in some way mutually exclusive. If the endpoint is being closed, it should process no more packet. If the endpoint is in the processing of a packet, the stop() method should wait for the completion of it.

          • 2. Re: Latest deadlock in trunk tests
            gaohoward

            I added a 'stopLock' object to synchronize the two methods. However the stop() method is synchronized on the whole endpoint object. It needs to investigate how to remove this and only use the stopLock.

            • 3. Re: Latest deadlock in trunk tests
              borges

              Hi Howard,

               

              I removed the stopLock and moved the finishSynchronization(int) call (that has to lock the journal) into the synchronized block. I don't know why I left it out, perhaps in the begning there was no need to lock the journal for it.

               

              That should avoid the dead-lock you got.

               

              svn r12243.

              • 4. Re: Latest deadlock in trunk tests
                gaohoward

                ok cool.