13 Replies Latest reply on Jul 11, 2011 8:18 AM by mircea.markus

    question on deadlock detection

    yelin666

      I tried the deaklock detection, and it works effectively. However, I got the following questions:

      1. First, in a distributed environment like Infinispan, deadlock does not only happen in the case described in the article http://infinispan.blogspot.com/2009/07/increase-transactional-throughput-with.html (that's the traditional one when the lock is centralized). Also, when two transactions only got one conflicting key, it still could cause deadlock in a distributed environment when the the two updates are concurrent and the times are close enough. Currently, when deadlock is detected, one transaction proceeds while the other is rolled back. So when a number of keys are being updated and only 1 key gets conflict, the whole transaction is rolled back and lose the whole update. I am wondering, potentiall if it's possible at all, to let one transaction proceed first, and the other one follow sequentially and override the conflicting key? I understand it got enginnering complexity involved, but just want to check if it's possible.
      2. The current implementation uses the coin toss to pick which one to roll back according to the above article, is it possible to pick based on the time the update being invoked? If so and 1 is possible as well, can we let the later one override the earlier one?
      3. For deadlock detection, the default spinDuration is 100 ms. What's the minimum spingDuration I can set that would work effectively?

       

      Thanks in advance.

       

      Lin

        • 1. Re: question on deadlock detection
          an1310

          Hi Lin,

           

          It sounds like you might want to take a look at the "eagerLockSingleNode" and distributed executor framework.  The framework allows you to execute a task on a key's owner, so you can acquire local locks, which'll give you the sequential behavior you're looking for.

          • 2. Re: question on deadlock detection
            mircea.markus
            1. First, in a distributed environment like Infinispan, deadlock does not only happen in the case described in the articlehttp://infinispan.blogspot.com/2009/07/increase-transactional-throughput-with.html (that's the traditional one when the lock is centralized). Also, when two transactions only got one conflicting key, it still could cause deadlock in a distributed environment when the the two updates are concurrent and the times are close enough. Currently, when deadlock is detected, one transaction proceeds while the other is rolled back. So when a number of keys are being updated and only 1 key gets conflict, the whole transaction is rolled back and lose the whole update. I am wondering, potentiall if it's possible at all, to let one transaction proceed first, and the other one follow sequentially and override the conflicting key? I understand it got enginnering complexity involved, but just want to check if it's possible.

            Thanks for the feedback Lin. Deadlock detection(and locking in general) is under review and some very significant improvement are on their way:

            ISPN-1132, ISPN-1131 and ISPN-1137. Feel free to vote for them.

            1. The current implementation uses the coin toss to pick which one to roll back according to the above article, is it possible to pick based on the time the update being invoked? If so and 1 is possible as well, can we let the later one override the earlier one?

            No. This is doable, but don't think it would be necassary given the improvements I've mentioned.

            For deadlock detection, the default spinDuration is 100 ms. What's the minimum spingDuration I can set that would work effectively?

            the smaller the spin duration the qucker the deadlock is detected, at the cost of some extra CPU cicles. Guess there's no straight answer for that, just try it and see what suits you best performance wise.

            • 3. Re: question on deadlock detection
              yelin666

              Mircea/Erik,

               

               

              Thanks for the responses. I would like to clarify couple things:

              1. I read the page on http://community.jboss.org/wiki/PossibleLockingImprovements that got descriptions on what is proposed for the 3 JIRA tickets. I am not sure I understand it correctly, but it seems to me all about the distributed cache. My question is - for a replicated cache, every key resides on every node, and is there a concept of owners or main owner of a key? For a replicated cache, if I am updated the same key on N1 & N2 concurrectly, how to avoid the deadlock?
              2. Erik suggested to look at "eagerLockSingleNode" and distributed executor framework. I did a search on Google, but couldn't find a page with comprehansive information on that topic. Could you please direct me to a link with the right information? So I can catch up on this area.

               

              Thanks in advance.

               

               

              Lin

              • 4. Re: question on deadlock detection
                an1310

                Lin,

                 

                You said you wanted one transaction to proceed and the other transations to follow sequentially.  If that's the case, you could get the behavior you wanted by looking at Infinispan 5.0's new distributed executor framework.  There, you can direct a task to execute on the key's primary data owner. 

                 

                If your use case permits it, you could also use the "eagerLockSingleNode" option to only acquire local locks.  The benefit there is that there are no remote locks acquired, so no deadlock detection is needed.  This is especially beneficial if your business logic involves more than one cache, since DLD doesn't support keys across multiple caches (yet).

                 

                You can find more on the Wiki.  Note that this is for the distributed case only, though.

                • 5. Re: question on deadlock detection
                  yelin666

                  Thanks for the information. I'll check out the details of it later (have to catch up on the 5.0 stuff).

                   

                  You mentioned it's for distributed case only. So for replicated cache, what options could I have?

                   

                  Thanks!

                  • 6. Re: question on deadlock detection
                    yelin666

                    Hi Mircea,

                     

                    I got one more question on an extended scenario from #2 at http://community.jboss.org/wiki/PossibleLockingImprovements. The general scenario is the same, but assume we do a putAll() with 2 keys K1 & K2 from both N1 & N2, and the values we put on N1 are V11 & V12, and values we put on N2 are V21 & V22 respectively. If the main owner for K1 is N3, and the main owner for K2 is N4, how would it work and what values would we end up with?

                     

                    And as mentioned above, I'd like to know the options we can have for replicated cache.

                     

                    Thanks in advance.

                     

                    Lin

                    • 7. Re: question on deadlock detection
                      mircea.markus
                      1. I read the page on http://community.jboss.org/wiki/PossibleLockingImprovements that got descriptions on what is proposed for the 3 JIRA tickets. I am not sure I understand it correctly, but it seems to me all about the distributed cache. My question is - for a replicated cache, every key resides on every node, and is there a concept of owners or main owner of a key? For a replicated cache, if I am updated the same key on N1 & N2 concurrectly, how to avoid the deadlock?

                      That is a good point Lin. I think that for replicated caches we can induce a main owner by the same means as we do with distributed caches: consistent hash. Thanks for bringing this up!

                      • 8. Re: question on deadlock detection
                        mircea.markus

                        I got one more question on an extended scenario from #2 at http://community.jboss.org/wiki/PossibleLockingImprovements. The general scenario is the same, but assume we do a putAll() with 2 keys K1 & K2 from both N1 & N2, and the values we put on N1 are V11 & V12, and values we put on N2 are V21 & V22 respectively. If the main owner for K1 is N3, and the main owner for K2 is N4, how would it work and what values would we end up with?


                        This is not covered by lock improvements doc nor by existing DLD mechanism.  Food for thought - I'll be back on this.

                        • 9. Re: question on deadlock detection
                          yelin666

                          Hi Mircea,

                           

                          Thanks for the feedbacks. I am just wondering if you can share your opinions on having an owner for a key in a replicated mode. Actually, one of our teams got their own proprietary solution for a replicated cache, and they used the "owning node" concept. However, for some reason I thought that's kind of legacy solution, and defeats the advantage of supporting update from any node in a cluster. When updating from a non-owning node, it requires sending the data to the owning node that introduces peformance degradation. Do you think we have other alternatives? I'd appreciate your inputs.

                           

                          Thanks,

                          Lin

                          • 10. Re: question on deadlock detection
                            mircea.markus

                            Here is the deadlock detection solution we've came up with: http://community.jboss.org/wiki/IncrementalOptimisticLocking

                            It doesn't work for the eager locking though, but for lazy locking, together with lock reordering[1] I think it covers 100% of deadlock scenarios.

                             

                            [1] http://community.jboss.org/wiki/LockReorderingForAvoidingDeadlocks

                            • 11. Re: question on deadlock detection
                              mircea.markus

                              I am just wondering if you can share your opinions on having an owner for a key in a replicated mode. Actually, one of our teams got their own proprietary solution for a replicated cache, and they used the "owning node" concept.

                              I think the easiest way of finding an owning key, even in replicated mode, is by using a consistent hash.

                               

                              However, for some reason I thought that's kind of legacy solution, and defeats the advantage of supporting update from any node in a cluster. When updating from a non-owning node, it requires sending the data to the owning node that introduces peformance degradation. Do you think we have other alternatives? I'd appreciate your inputs

                              Yes, it's true that you'll end up having a delegation layer. We gave it a very long thought and, unless a total order approach is used, this is the best we could cam up with. Here is the document describing our approach for implementing it: http://community.jboss.org/wiki/SingleNodeLockingModel

                              • 12. Re: question on deadlock detection
                                yelin666

                                Mircea,

                                 

                                Thanks for the feedbacks. I'd like to confirm with you - does it mean the decision was made to use the SingleNodeLockingModel for both distributed and replicated cache? And in which release will this be available? Thanks in advance for your clarification.

                                 

                                Cheers,

                                Lin

                                • 13. Re: question on deadlock detection
                                  mircea.markus

                                  Thanks for the feedbacks. I'd like to confirm with you - does it mean the decision was made to use the SingleNodeLockingModel for both distributed and replicated cache?

                                  Yes. The plan is to have them in 5.1.