0 Replies Latest reply: Dec 20, 2007 11:35 AM by Fredrik Johansson RSS

Timeout when starting 2nd cache

Fredrik Johansson Newbie

Hi.
We are currently testing a scenario of starting up an additional cache to a cluster where replication is running. We have been using 2.0.0 GA with Jgroups 2.5.0, but have encountered problems with this (mainly 'cache not in started state'). So we decided to try 2.1.0 CR2.

It seems that by using 2.1.0 CR2 and jgroups 2.6.1 (not the bundled version), we get the most stable node up and down. However, sometimes when starting up a second cache we get the lock timeout described in the stacktraces below.

----------------- Server 1 (Producer) -----------------

org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/_BUDDY_BACKUP_/192.168.1.135_50641//31, caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock=read owners=[Thread[Incoming,192.168.1.112:32903,10,Thread Pools]] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0)
 at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:528)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor$LockManager.acquire(PessimisticLockInterceptor.java:680)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.acquireNodeLock(PessimisticLockInterceptor.java:481)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:377)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java:193)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
 at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:34)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
 at org.jboss.cache.interceptors.NotificationInterceptor.invoke(NotificationInterceptor.java:32)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
 at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:298)
 at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:130)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
 at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:97)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
 at org.jboss.cache.interceptors.InvocationContextInterceptor.invoke(InvocationContextInterceptor.java:62)
 at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:4004)
 at org.jboss.cache.CacheImpl._replicate(CacheImpl.java:2861)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:330)
 at org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:77)
 at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:624)
 at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:533)
 at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
 at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:746)
 at org.jgroups.JChannel.up(JChannel.java:1151)
 at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:362)
 at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:427)
 at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:242)
 at org.jgroups.protocols.FRAG2.up(FRAG2.java:205)
 at org.jgroups.protocols.pbcast.GMS.up(GMS.java:767)
 at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
 at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:579)
 at org.jgroups.protocols.UNICAST.up(UNICAST.java:250)
 at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:714)
 at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
 at org.jgroups.protocols.FD.up(FD.java:322)
 at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:300)
 at org.jgroups.protocols.MERGE2.up(MERGE2.java:145)
 at org.jgroups.protocols.Discovery.up(Discovery.java:246)
 at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1535)
 at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1484)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
 at java.lang.Thread.run(Thread.java:619)
Caused by: org.jboss.cache.lock.TimeoutException: write lock for /_BUDDY_BACKUP_/192.168.1.135_50641//31 could not be acquired after 10000 ms. Locks: Read lock owners: [Thread[Incoming,192.168.1.112:32903,10,Thread Pools]]
Write lock owner: null
 (caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock info: read owners=[Thread[Incoming,192.168.1.112:32903,10,Thread Pools]] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0))
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java:244)
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:167)
 at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:497)
 ... 47 more



 ----------------- Server 2 (Replica) -----------------

 WARN CacheImpl - replication failure with method_call MethodName: _put; MethodIdInteger: 3; Args: (null, /_BUDDY_BACKUP_/192.168.1.135_50641//31, 8, , true) exception
org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/_BUDDY_BACKUP_/192.168.1.135_50641//31, caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock=read owners=[Thread[Incoming, 192.168.1.112:32903,10,Thread Pools]] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0)
 at org.jboss.cache.lock.IdentityLock.acquire (IdentityLock.java:528)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor$LockManager.acquire(PessimisticLockInterceptor.java:680)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.acquireNodeLock (PessimisticLockInterceptor.java:481)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:377)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java :193)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
 at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:34)
 at org.jboss.cache.interceptors.Interceptor.invoke (Interceptor.java:77)
 at org.jboss.cache.interceptors.NotificationInterceptor.invoke(NotificationInterceptor.java:32)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
 at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod (TxInterceptor.java:298)
 at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:130)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
 at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke (CacheMgmtInterceptor.java:97)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
 at org.jboss.cache.interceptors.InvocationContextInterceptor.invoke(InvocationContextInterceptor.java:62)
 at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:4004)
 at org.jboss.cache.CacheImpl._replicate(CacheImpl.java:2861)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.jgroups.blocks.MethodCall.invoke (MethodCall.java:330)
 at org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:77)
 at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java :624)
 at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:533)
 at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
 at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up (MessageDispatcher.java:746)
 at org.jgroups.JChannel.up(JChannel.java:1151)
 at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:362)
 at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:427)
 at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:242)
 at org.jgroups.protocols.FRAG2.up(FRAG2.java:205)
 at org.jgroups.protocols.pbcast.GMS.up(GMS.java:767)
 at org.jgroups.protocols.pbcast.STABLE.up (STABLE.java:234)
 at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:579)
 at org.jgroups.protocols.UNICAST.up(UNICAST.java:250)
 at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:714)
 at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
 at org.jgroups.protocols.FD.up(FD.java:322)
 at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:300)
 at org.jgroups.protocols.MERGE2.up (MERGE2.java:145)
 at org.jgroups.protocols.Discovery.up(Discovery.java:246)
 at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1535)
 at org.jgroups.protocols.TP$IncomingPacket.run(TP.java :1484)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
 at java.lang.Thread.run (Thread.java:619)
Caused by: org.jboss.cache.lock.TimeoutException: write lock for /_BUDDY_BACKUP_/192.168.1.135_50641//31 could not be acquired after 10000 ms. Locks: Read lock owners: [Thread[Incoming, 192.168.1.112:32903,10,Thread Pools]]
Write lock owner: null
 (caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock info: read owners=[Thread[Incoming, 192.168.1.112:32903,10,Thread Pools]] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0))
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java:244)
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:167)
 at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:497)
 ... 47 more


If we inspect the threads involved we see that thread A is waiting on thread B which is waiting on thread A. So we have (I think) something similar to a distributed dead lock scenario which is resolved by a lock timeout (after configured timeout).