1 2 3 Previous Next 30 Replies Latest reply on Aug 11, 2010 11:53 AM by manik

    Lock is not released on remote node after transaction commit

    pengyan

      We have a two node cluster in replication mode, and following code are executed on the first node(NodeA).

       

      TX.Begin

      cache.getAdvancedCache.lock(TestKey);

      Int value=cache.get(TestKey);

      cache.put(TestKey,value+1);

      Tx.commit.

       

      Normally the change will be replicated to the other node(NodeB) in the cluster, but during a continuous load testing, occasionally(more than onces in 100,000 times), there will be a TimeoutException. The stack trace looks like following.

       

      More tests shows this just happen when TestKey is not a String and primitive type, and lazyDeserialization is enabled. If I disable lazyDeserialization the issue is gone. Further investigation on the log files shows that during this transaction, on the remote node(NodeB), there will actaully be two locks acquired.  One is for the instance of TestKey, and the other one is a MarshalledValue instance. And when this error happens, it only release the lock of the MarshalledValue, and the lock for instance of TestKey will never be released.

       

      We are using 4.0.0final, several log files are attached. Unfornately I'm not able to create a test case for now.

      line 161024 shows there's only one entry in context, and line 161012 shows the transaction has successfully locked TestKey@14c, and line 161018 shows it has locked the Marshalled value, so there must be two entries in the context.  The question is why it get lost. The logging entries before line 161005 shows the normal case.

       

       

       

      And The stack trace.

       

      org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [2 seconds] on key [test.TestKey@14c] for requestor [DeadlockDetectingGlobalTransaction{replicatingTo=[], replicationThread=null, coinToss=3121282226971076589} GlobalTransaction:<changgeng-srm-6298>:8818]! Lock held by [DeadlockDetectingGlobalTransaction{replicatingTo=[], replicationThread=null, coinToss=-5363204352586943720} GlobalTransaction:<changgeng-srm-6298>:7991]
          at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:213)
          at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:148)
          at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:106)
          at org.infinispan.interceptors.LockingInterceptor.visitLockControlCommand(LockingInterceptor.java:146)
          at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:82)
          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
          at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
          at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:147)
          at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:82)
          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
          at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
          at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:147)
          at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:82)
          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
          at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
          at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:147)
          at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:82)
          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
          at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:171)
          at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:115)
          at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:82)
          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
          at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
          at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
          at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:147)
          at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:82)
          at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
          at org.infinispan.commands.control.LockControlCommand.perform(LockControlCommand.java:99)
          at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:65)
          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:178)
          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:150)
          at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:578)
          at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:489)

        • 1. Re: Lock is not released on remote node after transaction commit
          pengyan

          The attached test case could reproduce this issue. Seems it's related with multiple thread, so it may only be reproduced when there are multiple cores/CPUs.

           

          I also try to test with 4.1.0.CR2, and there are also issues, but the exception stack trace is a bit different.

          • 2. Re: Lock is not released on remote node after transaction commit
            pengyan

            another test case shows that even disable lazydeserialiaztion the issue is still there if multiple keys are locked in the same transaction.

            • 3. Re: Lock is not released on remote node after transaction commit
              manik

              Can I confirm your current setup?  4.1.0.CR2, no lazy deserialization, and you are using deadlock detection?  Have you tried without deadlock detection?

              • 4. Re: Lock is not released on remote node after transaction commit
                pengyan

                With lazy deserialization being disabled, I just tested for 4.0.0Final. Surely you can change the version number in the pom.xml and rerun the testcase.

                 

                I haven't tried without deadlock detection.

                • 5. Re: Lock is not released on remote node after transaction commit
                  manik

                  If you could try with and without deadlock detection on both 4.0.0.Final and 4.1.0.CR2 that would be great.  It would help me pinpoint where to look.

                  • 6. Re: Lock is not released on remote node after transaction commit
                    pengyan

                    neither combination works.

                    • 7. Re: Lock is not released on remote node after transaction commit
                      manik

                      Ok, I can see timeout exceptions running your test with a few tweaks (mainly to improve log readability):

                      • Disabled deadlock detection
                      • Changed bind address to 127.0.0.1

                       

                      While I do see the timeout exceptions, I'm not entirely sure if locks are held unnecessarily.  It seems as though threads are properly releasing locks at the end of the transaction.  What seems to be the issue is that you have 50 threads competing for the same lock with a 2s lock acquisition timeout.  When I ran the test, just before the failure, I saw a pause of almost 9 seconds (maybe GC?) when the tx that held the lock was waiting for a remote ack.  This thread eventually got its ack and continued fine, but other txs blocking and waiting for this lock timed out and died, causing the test to fail.

                       

                      Is this test realistic (50 threads competing for a single lock)?  If so, you may want to increase your lock acquisition timeout to take things like GC pauses and network delays into account.

                       

                      I've attached a patch of the changes I made to your test, I've also attached a log snippet.  In the log snippet, note that I have filtered out everything except lines that pertain to GlobalTransaction 44846.  Note that the transaction starts (by worker thread #30) around 14:26:30, but it is a whole 9 seconds later that it attempts to commit this transaction (at 14:26:39).  The full logs show that it is only around this time (14:26:39) that the remote node acknowledges the previous put.  So for whatever reason (network slowdown, maybe GC) has caused the remote node to run a bit slower, and in the meanwhile waiting threads timeout.  My guess would be on GC - have you attempted to capture such information using jconsole, etc?

                      • 8. Re: Lock is not released on remote node after transaction commit
                        pengyan

                        One thing I should metion ealier is that when I tried to reproduced the problem, the problem is gone when I enable trace level logging for org.infinispan logger.

                         

                        Also how do you explain that in the first log file I uploaded, on line  161024 it shows there's only one entry in context?

                        • 9. Re: Lock is not released on remote node after transaction commit
                          manik

                          If this pertains to the same test, then you are only working with a single entry in each transaction, correct?

                          • 10. Re: Lock is not released on remote node after transaction commit
                            pengyan

                            There's only one key in that test, but the lazyDeserialization is turn on, so normally there should be two entries in the context. And things goes wrong we can find the log files shows in one transaction there's only one entry during unlock process. Two keys are locked but only one is unlocked.

                            • 11. Re: Lock is not released on remote node after transaction commit
                              pengyan

                              I'm attaching another test case with 20 threads and each thread are trying to lock different keys. This test case run with lazyDeseralization enabled. The log file is also in the archive file. Line 130179 in test.log shows in  GlobalTransaction:<localhost-5100>:3093, there is only one entries in context, and it should be 2. So the lock for TestKey{key=key0} is not released in the remote node.

                              • 12. Re: Lock is not released on remote node after transaction commit
                                manik

                                Ok, I have recreated this in isolation.  I'm gonna add a unit test to trunk as well, rest assured it will be addressed prior to 4.1.0.Final.

                                 

                                If you want to watch/vote for the JIRA, it is ISPN-570.

                                • 13. Re: Lock is not released on remote node after transaction commit
                                  manik

                                  Ok I have a fix in place.  Publishing a new 4.1.0-SNAPSHOT right now so you can test against it.  Pls let me know how it goes, I'm holding up 4.1.0.Final for your feedback on this issue.

                                  • 14. Re: Lock is not released on remote node after transaction commit
                                    pengyan

                                    Thanks for the efforts. Now there are still two types of exception that occurs occasionally after I disable all the loggers.

                                    I attached the stack trace for them. Also the output for the test running are below for you to confirm that I'm the right version.

                                     

                                     

                                    hunk@localhost /tmp/singlekey-lazydeseralization$mvn  clean test
                                    exec /work/jdk1.6.0_16/bin/java -Xms256m -Xmx1024m -XX:MaxPermSize=200M -classpath /work/apache-maven-2.2.1/boot/classworlds-1.1.jar -Dclassworlds.conf=/work/apache-maven-2.2.1/bin/m2.conf -Dmaven.home=/work/apache-maven-2.2.1 org.codehaus.classworlds.Launcher "clean" "test"
                                    [INFO] Scanning for projects...
                                    [INFO] ------------------------------------------------------------------------
                                    [INFO] Building replace test case
                                    [INFO]    task-segment: [clean, test]
                                    [INFO] ------------------------------------------------------------------------
                                    [INFO] [clean:clean {execution: default-clean}]
                                    [INFO] Deleting directory /tmp/singlekey-lazydeseralization/target
                                    [INFO] [resources:resources {execution: default-resources}]
                                    [WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
                                    [INFO] Copying 0 resource
                                    [INFO] snapshot org.infinispan:infinispan-core:4.1.0-SNAPSHOT: checking for updates from public
                                    [INFO] snapshot org.infinispan:infinispan-core:4.1.0-SNAPSHOT: checking for updates from jboss-public-repository-group
                                    [INFO] snapshot org.infinispan:infinispan-core:4.1.0-SNAPSHOT: checking for updates from PublicMaven1Repository
                                    [INFO] snapshot org.infinispan:infinispan-core:4.1.0-SNAPSHOT: checking for updates from public-snapshots
                                    Downloading: https://repository.jboss.org/nexus/content/groups/public//org/infinispan/infinispan-core/4.1.0-SNAPSHOT/infinispan-core-4.1.0-20100804.172417-3.pom
                                    3K downloaded  (infinispan-core-4.1.0-20100804.172417-3.pom)
                                    [INFO] snapshot org.infinispan:infinispan-parent:4.1.0-SNAPSHOT: checking for updates from public
                                    [INFO] snapshot org.infinispan:infinispan-parent:4.1.0-SNAPSHOT: checking for updates from jboss-public-repository-group
                                    [INFO] snapshot org.infinispan:infinispan-parent:4.1.0-SNAPSHOT: checking for updates from PublicMaven1Repository
                                    [INFO] snapshot org.infinispan:infinispan-parent:4.1.0-SNAPSHOT: checking for updates from public-snapshots
                                    Downloading: https://repository.jboss.org/nexus/content/groups/public//org/infinispan/infinispan-parent/4.1.0-SNAPSHOT/infinispan-parent-4.1.0-20100804.172417-3.pom
                                    17K downloaded  (infinispan-parent-4.1.0-20100804.172417-3.pom)
                                    Downloading: https://repository.jboss.org/nexus/content/groups/public//org/infinispan/infinispan-core/4.1.0-SNAPSHOT/infinispan-core-4.1.0-20100804.172417-3.jar
                                    989K downloaded  (infinispan-core-4.1.0-20100804.172417-3.jar)
                                    [INFO] [compiler:compile {execution: default-compile}]
                                    [INFO] Nothing to compile - all classes are up to date
                                    [INFO] [resources:testResources {execution: default-testResources}]
                                    [WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
                                    [INFO] Copying 3 resources
                                    [INFO] [compiler:testCompile {execution: default-testCompile}]
                                    [INFO] Compiling 2 source files to /tmp/singlekey-lazydeseralization/target/test-classes
                                    [INFO] [surefire:test {execution: default-test}]
                                    [INFO] Surefire report directory: /tmp/singlekey-lazydeseralization/target/surefire-reports

                                     

                                     

                                     

                                    hunk@localhost ~/.m2/repository/org/infinispan/infinispan-core/4.1.0-SNAPSHOT$ls -l
                                    total 2028
                                    -rw-rw-r-- 1 hunk hunk 1013453 2010-08-05 09:21 infinispan-core-4.1.0-20100804.172417-3.jar
                                    -rw-rw-r-- 1 hunk hunk      40 2010-08-05 09:21 infinispan-core-4.1.0-20100804.172417-3.jar.sha1
                                    -rw-rw-r-- 1 hunk hunk    3569 2010-08-05 09:21 infinispan-core-4.1.0-20100804.172417-3.pom
                                    -rw-rw-r-- 1 hunk hunk      40 2010-08-05 09:21 infinispan-core-4.1.0-20100804.172417-3.pom.sha1
                                    -rw-rw-r-- 1 hunk hunk 1013453 2010-08-05 09:21 infinispan-core-4.1.0-SNAPSHOT.jar
                                    -rw-rw-r-- 1 hunk hunk    3569 2010-08-05 09:21 infinispan-core-4.1.0-SNAPSHOT.pom
                                    -rw-rw-r-- 1 hunk hunk     365 2010-08-05 09:21 maven-metadata-jboss-public-repository-group.xml
                                    -rw-rw-r-- 1 hunk hunk      41 2010-08-05 09:21 maven-metadata-jboss-public-repository-group.xml.sha1
                                    -rw-rw-r-- 1 hunk hunk     177 2010-08-05 09:33 maven-metadata-PublicMaven1Repository.xml
                                    -rw-rw-r-- 1 hunk hunk     177 2010-08-05 09:33 maven-metadata-public-snapshots.xml
                                    -rw-rw-r-- 1 hunk hunk     177 2010-08-05 09:33 maven-metadata-public.xml
                                    hunk@localhost ~/.m2/repository/org/infinispan/infinispan-core/4.1.0-SNAPSHOT$

                                    1 2 3 Previous Next