0 Replies Latest reply on Oct 21, 2015 4:58 AM by ramiseesniblic

    Infinispan Replication Timeout

    ramiseesniblic

      Hi,

       

       

      I am having trouble with an error relating to a replication timeout with infinispan. I am running 3 instances of JBoss EAP-6.3 in standalone mode and have a number of replicated caches configured - an example of one such cache is shown below:

       

       

        <cache-container name="ClusteredContainer" jndi-name="java:jboss/infinispan/ClusteredContainer" start="EAGER">               

        <transport stack="tcp" cluster="jbosstca" lock-timeout="20000"/>

        <replicated-cache name="MyExampleCache" mode="SYNC">

        <locking isolation="READ_COMMITTED" acquire-timeout="20000" striping="false"/>

        <transaction mode="NON_DURABLE_XA" locking="PESSIMISTIC" stop-timeout="20000"/>

        </replicated-cache>

        ...

        </cache-container>

       

       

      My jgroups configuration is as follows:

       

       

        <subsystem xmlns="urn:jboss:domain:jgroups:1.1" default-stack="tcp">

           <stack name="tcp">

           <transport type="TCP" socket-binding="jgroups-tcp">

        <property name="oob_thread_pool.enabled">true</property>

        <property name="oob_thread_pool.min_threads">1</property>

        <property name="oob_thread_pool.max_threads">300</property>

        </transport>

        <protocol type="TCPPING">

        <property name="initial_hosts">tca-01[7600],tca-02[7600],tca-03[7600]</property>

        <property name="timeout">3000</property>

        </protocol>

               <protocol type="MERGE2">

        <property name="min_interval">10000</property>

        <property name="max_interval">20000</property>

        </protocol>

               <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>

               <protocol type="FD"/>

               <protocol type="VERIFY_SUSPECT"/>

               <protocol type="pbcast.NAKACK"/>

               <protocol type="UNICAST2"/>

               <protocol type="pbcast.STABLE"/>

               <protocol type="pbcast.GMS"/>

           </stack>

        </subsystem>

       

       

      I began having issues relating to replication timeouts some weeks ago that looked very similar to the issue described here: https://issues.jboss.org/browse/AS7-4277 I followed the suggested fix from this issue and changed my oob_thread_pool.max_threads property to be 300 and this appeared to resolve the issue. However, the issue now appears to have resurfaced. I now see the following error in the logs of my first instance whenever my third JBoss instance (rep1-tca-03) attempts to make a change to the value held under the key: M301_11_0851 in this cache.

       

       

      2015-10-21 09:16:58,871 ERROR [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-2580,shared=tcp) Exception executing command: org.infinispan.util.concurrent.TimeoutException: Could not acquire lock on M301_11_0851 on behalf of transaction GlobalTransaction:<rep1-tca-03/jbosstca>:1502217392:remote. Lock is being held by GlobalTransaction:<rep1-tca-03/jbosstca>:122425628:remote

              at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.newTimeoutException(AbstractTxLockingInterceptor.java:223)

              at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.waitForTransactionsToComplete(AbstractTxLockingInterceptor.java:216)

              at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockKeyAndCheckOwnership(AbstractTxLockingInterceptor.java:180)

              at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAndRegisterBackupLock(AbstractTxLockingInterceptor.java:129)

              at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:253)

              at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)

              at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)

              at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177)

              at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)

              at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:128)

              at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:186)

              at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)

              at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)

              at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177)

              at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitLockControlCommand(TransactionSynchronizerInterceptor.java:94)

              at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)

              at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:284)

              at org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:209)

              at org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:151)

              at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)

              at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:128)

              at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:97)

              at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)

              at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:343)

              at org.infinispan.commands.control.LockControlCommand.perform(LockControlCommand.java:150)

              at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:100)

              at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:121)

              at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:85)

              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:247)

              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)

              at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:600) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.JChannel.up(JChannel.java:707) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1025) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:453) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:751) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:609) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.FD.up(FD.java:253) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.MERGE2.up(MERGE2.java:205) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.Discovery.up(Discovery.java:359) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2607) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.TP.passMessageUp(TP.java:1260) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1822) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1795) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]

              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_65]

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_65]

              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_65]

       

       

      Interestingly - this error has been occurring for several days now and the error message always refers to the same GlobalTransaction holding the lock. i.e. every error states: "Lock is being held by GlobalTransaction:<rep1-tca-03/jbosstca>:122425628:remote". This does not seem right to me. Surely a lock should not be held for this length of time?

       

       

      If I restart the system then the error reappears, with the lock held on a key on this cache (or an identically configured one) and persists until rebooted again.

       

       

      Is there something wrong with my configuration? Do I need to increase the oob_thread_pool.max_threads to be greater than 300? This already seems like a large number to me...

       

       

      Any help would be much appreciated.

       

       

      Thanks