Infinispan Replication Timeout
ramiseesniblic Oct 21, 2015 4:58 AMHi,
I am having trouble with an error relating to a replication timeout with infinispan. I am running 3 instances of JBoss EAP-6.3 in standalone mode and have a number of replicated caches configured - an example of one such cache is shown below:
<cache-container name="ClusteredContainer" jndi-name="java:jboss/infinispan/ClusteredContainer" start="EAGER">
<transport stack="tcp" cluster="jbosstca" lock-timeout="20000"/>
<replicated-cache name="MyExampleCache" mode="SYNC">
<locking isolation="READ_COMMITTED" acquire-timeout="20000" striping="false"/>
<transaction mode="NON_DURABLE_XA" locking="PESSIMISTIC" stop-timeout="20000"/>
</replicated-cache>
...
</cache-container>
My jgroups configuration is as follows:
<subsystem xmlns="urn:jboss:domain:jgroups:1.1" default-stack="tcp">
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp">
<property name="oob_thread_pool.enabled">true</property>
<property name="oob_thread_pool.min_threads">1</property>
<property name="oob_thread_pool.max_threads">300</property>
</transport>
<protocol type="TCPPING">
<property name="initial_hosts">tca-01[7600],tca-02[7600],tca-03[7600]</property>
<property name="timeout">3000</property>
</protocol>
<protocol type="MERGE2">
<property name="min_interval">10000</property>
<property name="max_interval">20000</property>
</protocol>
<protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
<protocol type="FD"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK"/>
<protocol type="UNICAST2"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
</stack>
</subsystem>
I began having issues relating to replication timeouts some weeks ago that looked very similar to the issue described here: https://issues.jboss.org/browse/AS7-4277 I followed the suggested fix from this issue and changed my oob_thread_pool.max_threads property to be 300 and this appeared to resolve the issue. However, the issue now appears to have resurfaced. I now see the following error in the logs of my first instance whenever my third JBoss instance (rep1-tca-03) attempts to make a change to the value held under the key: M301_11_0851 in this cache.
2015-10-21 09:16:58,871 ERROR [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-2580,shared=tcp) Exception executing command: org.infinispan.util.concurrent.TimeoutException: Could not acquire lock on M301_11_0851 on behalf of transaction GlobalTransaction:<rep1-tca-03/jbosstca>:1502217392:remote. Lock is being held by GlobalTransaction:<rep1-tca-03/jbosstca>:122425628:remote
at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.newTimeoutException(AbstractTxLockingInterceptor.java:223)
at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.waitForTransactionsToComplete(AbstractTxLockingInterceptor.java:216)
at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockKeyAndCheckOwnership(AbstractTxLockingInterceptor.java:180)
at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAndRegisterBackupLock(AbstractTxLockingInterceptor.java:129)
at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:253)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:128)
at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:186)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177)
at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitLockControlCommand(TransactionSynchronizerInterceptor.java:94)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:284)
at org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:209)
at org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:151)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:128)
at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:97)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:343)
at org.infinispan.commands.control.LockControlCommand.perform(LockControlCommand.java:150)
at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:100)
at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:121)
at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:85)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:247)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:600) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.JChannel.up(JChannel.java:707) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1025) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:453) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:751) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:609) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.FD.up(FD.java:253) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.MERGE2.up(MERGE2.java:205) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.Discovery.up(Discovery.java:359) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2607) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.TP.passMessageUp(TP.java:1260) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1822) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1795) [jgroups-3.2.13.Final-redhat-1.jar:3.2.13.Final-redhat-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_65]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_65]
Interestingly - this error has been occurring for several days now and the error message always refers to the same GlobalTransaction holding the lock. i.e. every error states: "Lock is being held by GlobalTransaction:<rep1-tca-03/jbosstca>:122425628:remote". This does not seem right to me. Surely a lock should not be held for this length of time?
If I restart the system then the error reappears, with the lock held on a key on this cache (or an identically configured one) and persists until rebooted again.
Is there something wrong with my configuration? Do I need to increase the oob_thread_pool.max_threads to be greater than 300? This already seems like a large number to me...
Any help would be much appreciated.
Thanks