10 Replies Latest reply: Jul 2, 2012 6:13 PM by Eric Neilsen RSS

StateTransferInProgressException: Timed out waiting for the state transfer lock under load

dex chen Novice

I was doing some performance loading test aginst the ISPN cache and replication with a 2-node cluster (Replication mode) using ISPN 5.1.3.Final.

 

The cache is only in memory (i.e., not cach store).

 

Here is the outline of my tests:

 

I have a set of load drivers to put cache items, and doing some random retrieving of cache items.

 

When the number of cache items hit some big numbers (80K), the following exceptions are thrown on one of the nodes:

2012-04-24/13:16:09.312/MDT [ajp-bio-8009-exec-29] ERROR org.infinispan.interceptors.InvocationContextInterceptor[141] - ISPN000136: Execution error

org.infinispan.statetransfer.StateTransferInProgressException: Timed out waiting for the state transfer lock, state transfer in progress for view 58

     at org.infinispan.interceptors.StateTransferLockInterceptor.signalStateTransferInProgress(StateTransferLockInterceptor.java:199)

     at org.infinispan.interceptors.StateTransferLockInterceptor.handleWriteCommand(StateTransferLockInterceptor.java:176)

     at org.infinispan.interceptors.StateTransferLockInterceptor.visitPutKeyValueCommand(StateTransferLockInterceptor.java:151)

     at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:76)

     at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

     at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:124)

     at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:76)

     at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

     at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)

     at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:61)

     at org.infinispan.interceptors.IsMarshallableInterceptor.visitPutKeyValueCommand(IsMarshallableInterceptor.java:108)

     at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:76)

     at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

     at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:130)

     at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:89)

     at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:61)

     at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:76)

     at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)

     at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:941)

     at org.infinispan.CacheImpl.put(CacheImpl.java:654)

     at org.infinispan.CacheImpl.put(CacheImpl.java:646)

 

When this happens, I can not add any new cache items.

 

Are there any parameters I should tun to work around of this? 

 

Here is my config:

<default>

     <locking

         isolationLevel="READ_COMMITTED"

         lockAcquisitionTimeout="1500"

         writeSkewCheck="false"

         concurrencyLevel="5000"

         useLockStriping="false"

      />

    

      <transaction

            transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"

           

            syncRollbackPhase="false"

            syncCommitPhase="false"

            useEagerLocking="false"

            eagerLockSingleNode="false"

            cacheStopTimeout="30000" />

    

      <deadlockDetection enabled="true" spinDuration="1000"/>

      <jmxStatistics enabled="true"/>

     

    </default>

    

   <namedCache name="session">

      <clustering mode="replication">

         <stateRetrieval

            timeout="240000"

            fetchInMemoryState="true"

            alwaysProvideInMemoryState="false"

         />

         <async useReplQueue="true" replQueueInterval="5000" replQueueMaxElements="500" asyncMarshalling="false" />

 

      </clustering>

      <transaction transactionMode="NON_TRANSACTIONAL"/>

      <eviction

         maxEntries="500000"

         strategy="LRU"

      />

      <!--  time units below are millseconds -->

      <expiration

         wakeUpInterval="-1"

         lifespan="-1"

         maxIdle="-1"

      />

  • 1. Re: StateTransferInProgressException: Timed out waiting for the state transfer lock under load
    Mircea Markus Master

    Seems like you have some nodes joing/stopping whilst you're test run. During this time (i.e. state transfer) ISPN might abort certain transactions in order to guarantee consistency.

  • 2. Re: StateTransferInProgressException: Timed out waiting for the state transfer lock under load
    dex chen Novice

    No, there is no node joining/stopping (unless the ISPN/Jgroups detects missing heartbeats from other node under heavy load, although I did not see that).

     

    I should see view transfer if there is node leaving/joing.  In this case, I did not see view transfer.

     

     

    The issue is not the repeating Timeout exceptions, but the system does not recover itselft, unless I restart another node.

     

    If this indicates the contention between state transfer and put, how do I tun the configuration to reduce the contention? 

  • 3. Re: StateTransferInProgressException: Timed out waiting for the state transfer lock under load
    dex chen Novice

    I saw the ISPN-1838 is marked "Can not reproduce", which seems similar to issues I am experiencing.

     

    Is there any Guideline to tuning the jgroup configuration (jgroup-tcp,xml or  jgroup-udp.xml) for different conditions (cluster size, network, load ...)?  I saw couple other posts related performance, and it seems that everyone has some different settings for JGROUP.

  • 4. Re: StateTransferInProgressException: Timed out waiting for the state transfer lock under load
    Galder Zamarreño Master

    Dex, I think that JIRA was particular to the environment where it was being run.

     

    I'd suggest that you try with the latest final release, which is 5.1.4.FINAL and you try the JGroups configuration files that we ship there, which are:

    https://github.com/infinispan/infinispan/blob/5.1.4.FINAL/core/src/main/resources/jgroups-udp.xml

    https://github.com/infinispan/infinispan/blob/5.1.4.FINAL/core/src/main/resources/jgroups-tcp.xml

     

    If the issue is still present, please repeat the test and enable TRACE on org.infinispan package and attach a zip file with logs from all nodes.

  • 6. Re: StateTransferInProgressException: Timed out waiting for the state transfer lock under load
    dex chen Novice

    Galder:  In one of the thread/JIRA issues, I saw UNICAST is suggested to use in the place of UNICAST2. Which one is the recommend one for the 5.1.4.Final?

     

    In addition, the jgroup-tcp.xml seems still refers to JGroups-2.8.xsd. Should it refer to JGroups-3.0.xsd?

  • 7. Re: StateTransferInProgressException: Timed out waiting for the state transfer lock under load
    Galder Zamarreño Master

    Dex, UNICAST2, that's what the configs posted have

     

    Yeah, should refer to 3.0

  • 8. Re: StateTransferInProgressException: Timed out waiting for the state transfer lock under load
    dex chen Novice

    I saw the same issue with 5.1.4.Final  under high load.

     

    In our performance testing, we have 2 node cluster in replcation mode, and have load drivers to generate load and use a load balancer to distribute the load across the 2 cluster nodes randomly.

    The worst is that the system does not recover when got to this state.  Note: This seems to related to the high load.

     

    Here is the stack trace:

    ERROR org.infinispan.interceptors.InvocationContextInterceptor[146] - ISPN000136: Execution error

    org.infinispan.statetransfer.StateTransferInProgressException: Timed out waiting for the state transfer lock, state transfer in progress for view 20

             at org.infinispan.interceptors.StateTransferLockInterceptor.signalStateTransferInProgress(StateTransferLockInterceptor.java:201)

             at org.infinispan.interceptors.StateTransferLockInterceptor.visitPrepareCommand(StateTransferLockInterceptor.java:81)

             at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)

             at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

             at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)

             at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:126)

             at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)

             at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

             at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:131)

             at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:90)

             at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:126)

             at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)

             at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)

             at org.infinispan.transaction.TransactionCoordinator.commit(TransactionCoordinator.java:174)

             at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:121)

             at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelOnePhaseCommit(XAResourceRecord.java:663)

             at com.arjuna.ats.arjuna.coordinator.BasicAction.onePhaseCommit(BasicAction.java:2278)

             at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1463)

             at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:99)

             at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:159)

             at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1159)

             at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:119)

             at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1016)

             at org.infinispan.CacheImpl.put(CacheImpl.java:702)

             at org.infinispan.CacheImpl.put(CacheImpl.java:694)

  • 9. Re: StateTransferInProgressException: Timed out waiting for the state transfer lock under load
    Galder Zamarreño Master

    I'd suggest repeating the test and enabling TRACE logs on both nodes for org.infinispan category, and when the failures start to be seen, generate some thread dumps of each node, in case some threads have deadlocked.

  • 10. Re: StateTransferInProgressException: Timed out waiting for the state transfer lock under load
    Eric Neilsen Newbie

    We just say the same thing in our production server. It took the entire JBoss/InfiniSpan cluster down.

     

    15:44:54,584 ERROR [org.infinispan.transaction.TransactionCoordinator] (http--10.41.110.31-8080-54) ISPN000097: Error while processing 1PC PrepareCommand: org.infinispan.statetransfer.StateTransferInProgressException: Timed out waiting for the state transfer lock, state transfer in progress for view 56

              at org.infinispan.interceptors.StateTransferLockInterceptor.signalStateTransferInProgress(StateTransferLockInterceptor.java:199)

              at org.infinispan.interceptors.StateTransferLockInterceptor.visitPrepareCommand(StateTransferLockInterceptor.java:80)

              at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)

              at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:113)

              at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:130)

              at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:89)

              at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:113)

              at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)

              at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:113)

              at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.BatchingInterceptor.handleDefault(BatchingInterceptor.java:86)

              at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:113)

              at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)

              at org.jboss.as.clustering.infinispan.DefaultEmbeddedCacheManager$ClassLoaderAwareCommandInterceptor.handleDefault(DefaultEmbeddedCacheManager.java:410)

              at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:113)

              at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:131)

              at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)

              at org.infinispan.transaction.TransactionCoordinator.commit(TransactionCoordinator.java:174)

              at org.infinispan.transaction.synchronization.SynchronizationAdapter.afterCompletion(SynchronizationAdapter.java:81)

              at org.infinispan.transaction.tm.DummyTransaction.notifyAfterCompletion(DummyTransaction.java:272)

              at org.infinispan.transaction.tm.DummyTransaction.runCommitTx(DummyTransaction.java:321)

              at org.infinispan.transaction.tm.DummyTransaction.commit(DummyTransaction.java:90)

              at org.infinispan.transaction.tm.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:100)

              at org.infinispan.batch.BatchContainer.resolveTransaction(BatchContainer.java:123)

              at org.infinispan.batch.BatchContainer.endBatch(BatchContainer.java:105)

              at org.infinispan.batch.BatchContainer.endBatch(BatchContainer.java:86)

              at org.infinispan.CacheImpl.endBatch(CacheImpl.java:580)

              at org.infinispan.AbstractDelegatingCache.endBatch(AbstractDelegatingCache.java:75)

              at org.infinispan.AbstractDelegatingCache.endBatch(AbstractDelegatingCache.java:75)

              at org.infinispan.AbstractDelegatingCache.endBatch(AbstractDelegatingCache.java:75)

              at org.jboss.as.clustering.infinispan.invoker.BatchOperation.invoke(BatchOperation.java:26)

              at org.jboss.as.clustering.infinispan.invoker.RetryingCacheInvoker.invoke(RetryingCacheInvoker.java:60)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager$ForceSynchronousCacheInvoker.invoke(DistributedCacheManager.java:518)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.invoke(DistributedCacheManager.java:486)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.batch(DistributedCacheManager.java:482)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.removeSession(DistributedCacheManager.java:275)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.removeSessionLocal(DistributedCacheManager.java:264)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.getData(DistributedCacheManager.java:237)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.getSessionData(DistributedCacheManager.java:190)

              at org.jboss.as.web.session.DistributableSessionManager.loadSession(DistributableSessionManager.java:1390)

              at org.jboss.as.web.session.DistributableSessionManager.findSession(DistributableSessionManager.java:686)

              at org.jboss.as.web.session.DistributableSessionManager.findSession(DistributableSessionManager.java:85)

              at org.jboss.as.web.session.ClusteredSessionValve.handleRequest(ClusteredSessionValve.java:118)

              at org.jboss.as.web.session.ClusteredSessionValve.invoke(ClusteredSessionValve.java:91)

              at org.jboss.as.web.session.JvmRouteValve.invoke(JvmRouteValve.java:88)

              at org.jboss.as.web.session.LockingValve.invoke(LockingValve.java:56)

              at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153)

              at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)

              at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

              at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

              at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368)

              at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:897)

              at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:626)

              at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:2039)

              at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_30]

     

    15:44:54,584 ERROR [org.infinispan.transaction.tm.DummyTransaction] (http--10.41.110.31-8080-54) ISPN000111: afterCompletion() failed for SynchronizationAdapter{localTransaction=LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, transaction=DummyTransaction{xid=DummyXid{id=188271}, status=3}, lockedKeys=null, backupKeyLocks=null, viewId=23} org.infinispan.transaction.synchronization.SyncLocalTransaction@2df61} org.infinispan.transaction.synchronization.SynchronizationAdapter@2df80: org.infinispan.CacheException: Could not commit.

              at org.infinispan.transaction.synchronization.SynchronizationAdapter.afterCompletion(SynchronizationAdapter.java:83)

              at org.infinispan.transaction.tm.DummyTransaction.notifyAfterCompletion(DummyTransaction.java:272)

              at org.infinispan.transaction.tm.DummyTransaction.runCommitTx(DummyTransaction.java:321)

              at org.infinispan.transaction.tm.DummyTransaction.commit(DummyTransaction.java:90)

              at org.infinispan.transaction.tm.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:100)

              at org.infinispan.batch.BatchContainer.resolveTransaction(BatchContainer.java:123)

              at org.infinispan.batch.BatchContainer.endBatch(BatchContainer.java:105)

              at org.infinispan.batch.BatchContainer.endBatch(BatchContainer.java:86)

              at org.infinispan.CacheImpl.endBatch(CacheImpl.java:580)

              at org.infinispan.AbstractDelegatingCache.endBatch(AbstractDelegatingCache.java:75)

              at org.infinispan.AbstractDelegatingCache.endBatch(AbstractDelegatingCache.java:75)

              at org.infinispan.AbstractDelegatingCache.endBatch(AbstractDelegatingCache.java:75)

              at org.jboss.as.clustering.infinispan.invoker.BatchOperation.invoke(BatchOperation.java:26)

              at org.jboss.as.clustering.infinispan.invoker.RetryingCacheInvoker.invoke(RetryingCacheInvoker.java:60)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager$ForceSynchronousCacheInvoker.invoke(DistributedCacheManager.java:518)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.invoke(DistributedCacheManager.java:486)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.batch(DistributedCacheManager.java:482)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.removeSession(DistributedCacheManager.java:275)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.removeSessionLocal(DistributedCacheManager.java:264)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.getData(DistributedCacheManager.java:237)

              at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.getSessionData(DistributedCacheManager.java:190)

              at org.jboss.as.web.session.DistributableSessionManager.loadSession(DistributableSessionManager.java:1390)

              at org.jboss.as.web.session.DistributableSessionManager.findSession(DistributableSessionManager.java:686)

              at org.jboss.as.web.session.DistributableSessionManager.findSession(DistributableSessionManager.java:85)

              at org.jboss.as.web.session.ClusteredSessionValve.handleRequest(ClusteredSessionValve.java:118)

              at org.jboss.as.web.session.ClusteredSessionValve.invoke(ClusteredSessionValve.java:91)

              at org.jboss.as.web.session.JvmRouteValve.invoke(JvmRouteValve.java:88)

              at org.jboss.as.web.session.LockingValve.invoke(LockingValve.java:56)

              at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153)

              at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)

              at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

              at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

              at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368)

              at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:897)

              at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:626)

              at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:2039)

              at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_30]

    Caused by: javax.transaction.xa.XAException

              at org.infinispan.transaction.TransactionCoordinator.handleCommitFailure(TransactionCoordinator.java:219)

              at org.infinispan.transaction.TransactionCoordinator.commit(TransactionCoordinator.java:176)

              at org.infinispan.transaction.synchronization.SynchronizationAdapter.afterCompletion(SynchronizationAdapter.java:81)

              ... 36 more