0 Replies Latest reply on Dec 20, 2007 11:35 AM by fredrikj

    Timeout when starting 2nd cache

    fredrikj

      Hi.
      We are currently testing a scenario of starting up an additional cache to a cluster where replication is running. We have been using 2.0.0 GA with Jgroups 2.5.0, but have encountered problems with this (mainly 'cache not in started state'). So we decided to try 2.1.0 CR2.

      It seems that by using 2.1.0 CR2 and jgroups 2.6.1 (not the bundled version), we get the most stable node up and down. However, sometimes when starting up a second cache we get the lock timeout described in the stacktraces below.

      ----------------- Server 1 (Producer) -----------------
      
      org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/_BUDDY_BACKUP_/192.168.1.135_50641//31, caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock=read owners=[Thread[Incoming,192.168.1.112:32903,10,Thread Pools]] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0)
       at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:528)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor$LockManager.acquire(PessimisticLockInterceptor.java:680)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.acquireNodeLock(PessimisticLockInterceptor.java:481)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:377)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java:193)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
       at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:34)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
       at org.jboss.cache.interceptors.NotificationInterceptor.invoke(NotificationInterceptor.java:32)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
       at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:298)
       at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:130)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
       at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:97)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
       at org.jboss.cache.interceptors.InvocationContextInterceptor.invoke(InvocationContextInterceptor.java:62)
       at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:4004)
       at org.jboss.cache.CacheImpl._replicate(CacheImpl.java:2861)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:330)
       at org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:77)
       at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:624)
       at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:533)
       at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
       at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:746)
       at org.jgroups.JChannel.up(JChannel.java:1151)
       at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:362)
       at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:427)
       at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:242)
       at org.jgroups.protocols.FRAG2.up(FRAG2.java:205)
       at org.jgroups.protocols.pbcast.GMS.up(GMS.java:767)
       at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
       at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:579)
       at org.jgroups.protocols.UNICAST.up(UNICAST.java:250)
       at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:714)
       at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
       at org.jgroups.protocols.FD.up(FD.java:322)
       at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:300)
       at org.jgroups.protocols.MERGE2.up(MERGE2.java:145)
       at org.jgroups.protocols.Discovery.up(Discovery.java:246)
       at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1535)
       at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1484)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
       at java.lang.Thread.run(Thread.java:619)
      Caused by: org.jboss.cache.lock.TimeoutException: write lock for /_BUDDY_BACKUP_/192.168.1.135_50641//31 could not be acquired after 10000 ms. Locks: Read lock owners: [Thread[Incoming,192.168.1.112:32903,10,Thread Pools]]
      Write lock owner: null
       (caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock info: read owners=[Thread[Incoming,192.168.1.112:32903,10,Thread Pools]] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0))
       at org.jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java:244)
       at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:167)
       at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:497)
       ... 47 more
      



       ----------------- Server 2 (Replica) -----------------
      
       WARN CacheImpl - replication failure with method_call MethodName: _put; MethodIdInteger: 3; Args: (null, /_BUDDY_BACKUP_/192.168.1.135_50641//31, 8, , true) exception
      org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/_BUDDY_BACKUP_/192.168.1.135_50641//31, caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock=read owners=[Thread[Incoming, 192.168.1.112:32903,10,Thread Pools]] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0)
       at org.jboss.cache.lock.IdentityLock.acquire (IdentityLock.java:528)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor$LockManager.acquire(PessimisticLockInterceptor.java:680)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.acquireNodeLock (PessimisticLockInterceptor.java:481)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:377)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java :193)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
       at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:34)
       at org.jboss.cache.interceptors.Interceptor.invoke (Interceptor.java:77)
       at org.jboss.cache.interceptors.NotificationInterceptor.invoke(NotificationInterceptor.java:32)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
       at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod (TxInterceptor.java:298)
       at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:130)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
       at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke (CacheMgmtInterceptor.java:97)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
       at org.jboss.cache.interceptors.InvocationContextInterceptor.invoke(InvocationContextInterceptor.java:62)
       at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:4004)
       at org.jboss.cache.CacheImpl._replicate(CacheImpl.java:2861)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.jgroups.blocks.MethodCall.invoke (MethodCall.java:330)
       at org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:77)
       at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java :624)
       at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:533)
       at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
       at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up (MessageDispatcher.java:746)
       at org.jgroups.JChannel.up(JChannel.java:1151)
       at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:362)
       at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:427)
       at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:242)
       at org.jgroups.protocols.FRAG2.up(FRAG2.java:205)
       at org.jgroups.protocols.pbcast.GMS.up(GMS.java:767)
       at org.jgroups.protocols.pbcast.STABLE.up (STABLE.java:234)
       at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:579)
       at org.jgroups.protocols.UNICAST.up(UNICAST.java:250)
       at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:714)
       at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
       at org.jgroups.protocols.FD.up(FD.java:322)
       at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:300)
       at org.jgroups.protocols.MERGE2.up (MERGE2.java:145)
       at org.jgroups.protocols.Discovery.up(Discovery.java:246)
       at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1535)
       at org.jgroups.protocols.TP$IncomingPacket.run(TP.java :1484)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
       at java.lang.Thread.run (Thread.java:619)
      Caused by: org.jboss.cache.lock.TimeoutException: write lock for /_BUDDY_BACKUP_/192.168.1.135_50641//31 could not be acquired after 10000 ms. Locks: Read lock owners: [Thread[Incoming, 192.168.1.112:32903,10,Thread Pools]]
      Write lock owner: null
       (caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock info: read owners=[Thread[Incoming, 192.168.1.112:32903,10,Thread Pools]] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0))
       at org.jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java:244)
       at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:167)
       at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:497)
       ... 47 more
      


      If we inspect the threads involved we see that thread A is waiting on thread B which is waiting on thread A. So we have (I think) something similar to a distributed dead lock scenario which is resolved by a lock timeout (after configured timeout).