14 Replies Latest reply on Jul 7, 2015 12:22 PM by william.burns

    7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."

    richmayfield

      I just changed from 6.0.0.Final to 7.0.0.Final and am now seeing the following error pretty consistently:

       

      org.infinispan.util.concurrent.TimeoutException: Did not retrieve entries in allotted timeout: 240000 units: MILLISECONDS

              at org.infinispan.iteration.impl.LocalEntryRetriever$Itr.hasNext(LocalEntryRetriever.java:440)

              at org.infinispan.iteration.impl.RemovableEntryIterator.getNextFromIterator(RemovableEntryIterator.java:33)

              at org.infinispan.iteration.impl.RemovableEntryIterator.<init>(RemovableEntryIterator.java:28)

              at org.infinispan.iteration.impl.TrackingEntryIterable.iterator(TrackingEntryIterable.java:72)

              at org.infinispan.cache.impl.CacheImpl.isEmpty(CacheImpl.java:377)

              at org.infinispan.cache.impl.CacheImpl.isEmpty(CacheImpl.java:371)

              ...


      The cache manager is set up to be clustered - pretty generic jgroups-tcp.xml - and the caches are set up with


      new ConfigurationBuilder().clustering().cacheMode(CacheMode.INVALIDATION_ASYNC).build()


      We did not see this error in 6.0.0.Final. Looking at the code it appears that this is one area where CacheImpl changed since 6.0.0.Final.


      Any ideas what might be going on here or what I need to change to get this working under 7.0.0.Final?


      Thanks much,

      rich

        • 1. Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
          william.burns

          I would not expect to see this with an invalidation cache let alone when calling isEmpty.  Do you happen to have a stack trace when this occurs?  Is this cache empty?

           

          Also do you happen to have access to the logger output?  If some exception happened that can cause this I would expect to see

          "There was an exception while processing retrieval of entry values" logged at the WARN level with the exception that caused it.

          • 2. Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
            richmayfield

            The ... in the stack I cited in the original post is our code calling Cache.isEmpty() - not sure that seeing our package/class names would be terribly relevant so I left that out.

             

            org.infinispan.iteration.impl.LocalEntryRetriever  - ISPN000300: There was an exception while processing retrieval of entry values

            org.infinispan.commons.CacheException: Entry Iterator was interrupted!

                    at org.infinispan.iteration.impl.LocalEntryRetriever$1.run(LocalEntryRetriever.java:297)

                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

                    at java.util.concurrent.FutureTask.run(FutureTask.java:262)

                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

                    at java.lang.Thread.run(Thread.java:744)


            Interestingly, this seems to correlate with the other (we only have 2 nodes in our cluster) node shutting down.

            • 3. Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
              richmayfield

              Scratch the last statement.

               

              When the other node in the cluster shuts down, I do see the stack

               

              org.infinispan.commons.CacheException: Entry Iterator was interrupted!

                      at org.infinispan.iteration.impl.LocalEntryRetriever$1.run(LocalEntryRetriever.java:297)

                      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

                      at java.util.concurrent.FutureTask.run(FutureTask.java:262)

                      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

                      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

                      at java.lang.Thread.run(Thread.java:744)

               

              However, I also see the "Did not retrieve entries in allotted timeout" on many other occasions when the other node is up and running and I do not see the above-mentioned exception.

              • 4. Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                william.burns

                Sorry let me clarify, when I say the stack trace I meant thread stacks from the rest of the process.  I am not interested in the remainder of the stack from that thread (what you posted is sufficient for that thread)  I was asking because the iterator used uses a thread from the async transport thread pool and I wanted to make sure there was a thread available to be used from it.

                 

                Do you happen to use any async operations on the cache?  Such as putAsync, getAsync etc?  It seems somehow the async threads are getting sent an interrupt status which in turn kills the iteration process.  If so do you ever invoke NotifyingFuture.cancel(true) on the returned future?

                 

                I actually logged [1] which would help in diagnosing the exception on a per iterator basis, but wouldn't fix the issue you are seeing because there is something outside of the scope of the iterator that is interrupting it it seems.

                 

                [1] [ISPN-4951] Entry Retriever iterator should rethrow caused exception - JBoss Issue Tracker

                • 5. Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                  richmayfield

                  I'll have to get back to you on the thread stacks - we've reverted the environment where we are seeing the problems back to 6.0.0. I'll see if I can set something up for a thread capture.

                   

                  We do not invoke any of the async calls. The cacheMode of the caches are INVALIDATION_ASYNC - that's the only piece of code indicating asynchronous behavior.

                   

                  Thanks much for your help. Really appreciate it.

                  • 6. Re: Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                    richmayfield

                    Got lucky. We hadn't updated yet so I was able to get thread information.

                     

                    Of the threads with "org.infinispan" in their stacks...

                     

                    Our thread calling into isEmpty():

                     

                    "http-HOSTNAME-15273-156" daemon prio=10 tid=0x00002b498c322800 nid=0x5459 waiting on condition [0x00002b49a50e6000]

                      java.lang.Thread.State: TIMED_WAITING (parking)

                            at sun.misc.Unsafe.park(Native Method)

                            - parking to wait for  <0x000000076ec44e20> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

                            at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

                            at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2176)

                            at org.infinispan.iteration.impl.LocalEntryRetriever$Itr.hasNext(LocalEntryRetriever.java:435)

                            at org.infinispan.iteration.impl.RemovableEntryIterator.getNextFromIterator(RemovableEntryIterator.java:33)

                            at org.infinispan.iteration.impl.RemovableEntryIterator.<init>(RemovableEntryIterator.java:28)

                            at org.infinispan.iteration.impl.TrackingEntryIterable.iterator(TrackingEntryIterable.java:72)

                            at org.infinispan.cache.impl.CacheImpl.isEmpty(CacheImpl.java:377)

                            at org.infinispan.cache.impl.CacheImpl.isEmpty(CacheImpl.java:371)

                          …

                     

                    25 threads that look like the following (thread suffix "t1" thru "t25")

                     

                    "transport-thread--p2-t25" daemon prio=10 tid=0x00002b498e540800 nid=0x4382 waiting on condition [0x00002b4993563000]

                      java.lang.Thread.State: WAITING (parking)

                            at sun.misc.Unsafe.park(Native Method)

                            - parking to wait for  <0x0000000760b7dce0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

                            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)

                            at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)

                            at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:324)

                            at org.infinispan.iteration.impl.LocalEntryRetriever$Itr.addEntries(LocalEntryRetriever.java:524)

                            at org.infinispan.iteration.impl.LocalEntryRetriever$ItrQueuerHandler.handleBatch(LocalEntryRetriever.java:405)

                            at org.infinispan.iteration.impl.LocalEntryRetriever$MapAction.apply(LocalEntryRetriever.java:383)

                            at org.infinispan.iteration.impl.LocalEntryRetriever$1.run(LocalEntryRetriever.java:294)

                            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

                            at java.util.concurrent.FutureTask.run(FutureTask.java:262)

                            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

                            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

                            at java.lang.Thread.run(Thread.java:744)

                     

                    And finally

                     

                    "Incoming-1,Hostname-18836" prio=10 tid=0x0000000003a4d800 nid=0x433b waiting on condition [0x00002b4987c42000]

                      java.lang.Thread.State: TIMED_WAITING (parking)

                            at sun.misc.Unsafe.park(Native Method)

                            - parking to wait for  <0x000000076630c200> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

                            at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

                            at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)

                            at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:90)

                            at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:206)

                            at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:86)

                            at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:267)

                            at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:211)

                            at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:460)

                            at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:377)

                            at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:250)

                            at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:677)

                            at org.jgroups.JChannel.up(JChannel.java:755)

                            at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1033)

                            at org.jgroups.protocols.RSVP.up(RSVP.java:237)

                            at org.jgroups.protocols.FRAG2.up(FRAG2.java:182)

                            at org.jgroups.protocols.FlowControl.up(FlowControl.java:447)

                            at org.jgroups.protocols.FlowControl.up(FlowControl.java:447)

                            at org.jgroups.stack.Protocol.up(Protocol.java:420)

                            at org.jgroups.stack.Protocol.up(Protocol.java:420)

                            at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)

                            at org.jgroups.protocols.UNICAST3.deliverBatch(UNICAST3.java:1087)

                            at org.jgroups.protocols.UNICAST3.removeAndDeliver(UNICAST3.java:886)

                            at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:790)

                            at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:426)

                            at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:619)

                            at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)

                            at org.jgroups.protocols.FD.up(FD.java:255)

                            at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:297)

                            at org.jgroups.protocols.MERGE2.up(MERGE2.java:237)

                            at org.jgroups.protocols.Discovery.up(Discovery.java:277)

                            at org.jgroups.protocols.TP.passMessageUp(TP.java:1572)

                            at org.jgroups.protocols.TP$MyHandler.run(TP.java:1791)

                            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

                            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

                            at java.lang.Thread.run(Thread.java:744)

                    • 7. Re: Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                      william.burns

                      Ah yes this is what we are looking for. There is a race condition that can cause the dependent thread to hang.  I logged [ISPN-4954] Entry Iterator can block transport thread if closed while trying to put wait into queu - JBoss Issue Tracker to fix this.

                       

                      For now I would say just to replace the call to isEmpty with size() == 0 and it should workaround until you can get ISPN 7.1.0.Alpha1.

                       

                      *edit* The fix for this is available in 7.0.1

                      • 8. Re: Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                        jeetu

                        Hi William,

                         

                        I am also facing the same issue with two nodes. My cache is DIST_SYNC but i load up the cache using multiple cache.putAsynch()s and i am also doing a cache.isEmpty() method call in my code.

                         

                        I can't upgrade to 7.0.1 yet. Will the work around to use size() == 0 work for my case too?

                        • 9. Re: Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                          william.burns

                          Yes, size() == 0 should work to remedy the issue on 7.0.0.  This will not be as performant as compared to calling isEmpty() in 7.1.0 though, which is what I would definitely recommend moving to whenever possible.

                          • 10. Re: Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                            jeetu

                            Thanks for responding. We will upgrade in our next release.

                             

                            Regards

                            Jithendra

                            • 11. Re: Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                              richmayfield

                              For what it's worth, calling size() as an alternate to isEmpty() still results in the error. Will get thread stacks next time. Will investigate 7.1 or later, but have to live with 7.0.3.Final for awhile. Any suggestions would be greatly appreciated.

                               

                              org.infinispan.util.concurrent.TimeoutException: Did not retrieve entries in allotted timeout: 600000 units: MILLISECONDS

                                      at org.infinispan.iteration.impl.LocalEntryRetriever$Itr.hasNext(LocalEntryRetriever.java:456)

                                      at org.infinispan.iteration.impl.RemovableEntryIterator.getNextFromIterator(RemovableEntryIterator.java:33)

                                      at org.infinispan.iteration.impl.RemovableEntryIterator.<init>(RemovableEntryIterator.java:28)

                                      at org.infinispan.iteration.impl.TrackingEntryIterable.iterator(TrackingEntryIterable.java:72)

                                      at org.infinispan.iteration.impl.TrackingEntryIterable.iterator(TrackingEntryIterable.java:23)

                                      at org.infinispan.commands.read.SizeCommand.perform(SizeCommand.java:47)

                                      at org.infinispan.commands.read.SizeCommand.perform(SizeCommand.java:25)

                                      at org.infinispan.interceptors.CallInterceptor.handleDefault(CallInterceptor.java:99)

                                      at org.infinispan.commands.AbstractVisitor.visitSizeCommand(AbstractVisitor.java:72)

                                      at org.infinispan.commands.read.SizeCommand.acceptVisitor(SizeCommand.java:35)

                                      at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

                                      at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

                                      at org.infinispan.commands.AbstractVisitor.visitSizeCommand(AbstractVisitor.java:72)

                                      at org.infinispan.commands.read.SizeCommand.acceptVisitor(SizeCommand.java:35)

                                      at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

                                      at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

                                      at org.infinispan.commands.AbstractVisitor.visitSizeCommand(AbstractVisitor.java:72)

                                      at org.infinispan.commands.read.SizeCommand.acceptVisitor(SizeCommand.java:35)

                                      at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

                                      at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

                                      at org.infinispan.commands.AbstractVisitor.visitSizeCommand(AbstractVisitor.java:72)

                                      at org.infinispan.commands.read.SizeCommand.acceptVisitor(SizeCommand.java:35)

                                      at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

                                      at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

                                      at org.infinispan.commands.AbstractVisitor.visitSizeCommand(AbstractVisitor.java:72)

                                      at org.infinispan.commands.read.SizeCommand.acceptVisitor(SizeCommand.java:35)

                                      at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

                                      at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

                                      at org.infinispan.commands.AbstractVisitor.visitSizeCommand(AbstractVisitor.java:72)

                                      at org.infinispan.commands.read.SizeCommand.acceptVisitor(SizeCommand.java:35)

                                      at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

                                      at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:102)

                                      at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:71)

                                      at org.infinispan.commands.AbstractVisitor.visitSizeCommand(AbstractVisitor.java:72)

                                      at org.infinispan.commands.read.SizeCommand.acceptVisitor(SizeCommand.java:35)

                                      at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)

                                      at org.infinispan.cache.impl.CacheImpl.size(CacheImpl.java:367)

                                      at org.infinispan.cache.impl.CacheImpl.size(CacheImpl.java:362)

                              • 12. Re: Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                                richmayfield

                                That issue appears to have been resolved in 7.0.1.Final. I'm seeing the error calling size() in 7.0.3.Final.

                                • 13. Re: Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                                  richmayfield

                                  Wait... just ran across Infinispan 7.0.x, still "Did not retrieve entries in allotted timeout..." errors. My code may be susceptible to this other issue (not closing an iterator returned from entrySet()).

                                  • 14. Re: Re: 7.0.0 migration issue? "Did not retrieve entries in allotted timeout..."
                                    william.burns

                                    This seems like a different issue. I would need the stack trace from the process to tell what you are seeing.  Did you notice any other timeouts on other threads besides this one?

                                     

                                    -edit- Ah yes that would be a match then.