1 2 Previous Next 17 Replies Latest reply on Feb 14, 2012 8:34 AM by vblagojevic

    Concurrent write issue encountered with TreeCache at client site...

    rcbj

      I'm hoping someone can help me with this.  I'm working with a client that is using Infinispan 5.1.0.FINAL on Sun JDK 1.6.0.  They've run into what appears to be a concurrency issue around TreeCache recently.  They have provided a test case that reproduces the issue.  The issue was originally occuring in a three node cluster, but  the test case given here can reproduce the issue in a single JVM.

       

      There is a source cache and a destination cache.

       

      The source cache's root node is populated with 500 child nodes (with a same string for both key & value--ChildNode1, ChildNode2, etc).

       

      A DistributedCallable object then spawns a series of Callable objects that process each of the nodes in the source cache and put copies in the destination cache. Each put operation claims to be successful.

       

      After this is completed, they walk through the destination cache checking to see if all 500 of the nodes are present.  Often when they run this, they receive something like:

      java.lang.Exception: Cache-1 Destination root node lacked a key for ChildNode1

       

      This means that the second node of 500 was missing from the cache even though the write to the destination cache claimed it was successful.

       

      I have run this test dozens of times.  I've gotten it to complete a couple of times.  So, this leads me to believe it is some kind of concurrency issue.

       

      I'm including the source code for the test program that reproduces the issue.  This is a greatly simplified version of what happens at the client site.

       

      Thank you in advance for you time.

        • 1. Re: Concurrent write issue encountered with TreeCache at client site...
          vblagojevic

          Robert,

           

          I've looked at this use case and I noticed that inside of call method of DistributedCallable you use "regular" executor service to submit bunch of jobs and wait for their completion. Note, although it is perfectly reasonable for you to attempt, we have however, never intended to support this scenario yet. DistributedCallables are not supposed to at execution point spawn/use bunch of other threads. Is there any way you can achieve what you want without using executor service inside distributed callable?

           

          Regards,

          Vladimir

          • 2. Re: Concurrent write issue encountered with TreeCache at client site...
            rcbj

            Hi Vladimir.

             

            Thanks for the response.  What is the recommended approach for a situation where you'd like to be able to run multiple instance of a "job" inside of an Infinispan node?  Or, is this simply not supported? ie, add more Infinispan nodes/JVMs?  By job in this case, I mean the logic inside of the Callable call() method.

             

            I am attempting to simplify the test case further and still reproduce our original issue.

             

            Thanks,

             

            RCBJ

            • 3. Re: Concurrent write issue encountered with TreeCache at client site...
              vblagojevic

              Hey Robert,

               

              Yes, it is not supported, we still have to devise execution model per Infinispan node. For now, we support the most simple single thread execution model, where each thread which receives Callable on remote node executes it and returns result to caller as well. Later in 6.0, we'll see how to improve and allow other models of execution.

               

              I wold suggest that you do not use executors within DistributedCallable#call or spawn other threads. Yes, please report back with simplified model and results.

               

              Regards,

              Vladimir

              • 4. Re: Concurrent write issue encountered with TreeCache at client site...
                rogerdunn

                Hi Vladimir,

                 

                I'm wondering if the use of an executor service from within DistributedCallable#call might be a supported execution model, provided that those callables never write to the Cache.

                 

                What I'm attempting to understand is if it makes sense for us to separate out read operations from the cache, which might hopefully be allowed from within those callables, and write operations to the cache, which it sounds like must be performed from within the DistributedCallable#call's thread.  If that is a supported scenario, then the callables can return the result of their processing from their call() methods, and the DistributedCallable#call can then consume those futures, updating the cache on its thread.  And this would have a tremendously positive impact on being able to parallelize the operations that occur on the Cache's keys that are local to a given Infinispan node, compared to having to process those keys sequentially.

                 

                Would that be a supported scenario?

                 

                Cheers,

                 

                Roger

                • 5. Re: Concurrent write issue encountered with TreeCache at client site...
                  rcbj

                  Vladimir,

                   

                  Hello again.

                   

                  We've created a simplified test case that removes DistributedCallable and just uses an ExecuterService and Callable objects.  Inside each Callable#call(), we are again, copying from a source cache to a destination cache.  Inside each call() method, we are doing a put() operation and then immediately trying to read the value back out with a get() operation.  With 50 key:value pairs, we usually get several of the call() methods to report that they didn't find the value they had just successfully inserted.

                   

                  I'm attaching the source code for the use case.

                  • 6. Re: Concurrent write issue encountered with TreeCache at client site...
                    vblagojevic

                    Robert,

                     

                    It could be a bug. Would you please convert this mini project to use maven and zip it up again. One or more of us will then take a look and find the cause!

                     

                    Regards,

                    Vladimir 

                    • 7. Re: Concurrent write issue encountered with TreeCache at client site...
                      vblagojevic

                      Roger,

                       

                      I have to admit and say I do not know! I have not personally tried this and I am only now starting to work on this for 5.2 release! I'd say try it out, see what happens and report back.

                       

                      Regards,

                      Vladimir

                      • 8. Re: Concurrent write issue encountered with TreeCache at client site...
                        ovidiu.feodorov

                        Vladimir,

                         

                        I have mavenized Robert's test case, and attached it here. All you need to do is to unzip, then mvn clean test. There is also a README in root that contains essentially the same instructions.

                         

                        I have not looked in depth at the test semantics, but at least we will have a common ground to strart from.

                         

                        Please let me know if you can run.

                         

                        Thanks

                        Ovidiu

                        • 9. Re: Concurrent write issue encountered with TreeCache at client site...
                          ovidiu.feodorov

                          Looks like a TreeCache defect. Can replicate with TreeCache, using the Infinispan Cache API works fine. The latest mavenized test suite attached. README updated.

                          • 10. Re: Concurrent write issue encountered with TreeCache at client site...
                            vblagojevic

                            Thanks for this effort Ovidiu! I can import the project and run the test cases. Looking into this, will report back in a day or so!

                            • 11. Re: Concurrent write issue encountered with TreeCache at client site...
                              vblagojevic

                              Ovidiu,

                               

                              I have fixed your failing test and it now passes consistently in my local setup. Look for the changes in TestHelper.java where I have simply used Configuration framework setup we use in our test suite. Unfortunately due to my work load I am not able to look for the cause of your original test failure.

                               

                              Let me know if it works for you guys as well.

                               

                              Regards,

                              Vladimir

                              • 12. Re: Concurrent write issue encountered with TreeCache at client site...
                                vblagojevic

                                Guys, I confirmed that the crucial change is use of pessimistic locking!

                                 

                                Regards,

                                Vladimir

                                • 13. Re: Concurrent write issue encountered with TreeCache at client site...
                                  ovidiu.feodorov

                                  Thank you, Vladimir

                                  • 14. Re: Concurrent write issue encountered with TreeCache at client site...
                                    rcbj

                                    The client has updated their test to use Cache (instead of TreeCache), not spawn threads inside of DistributedCallable#call() and the other suggestions made here.  In a single JVM, the problems seem to have been fixed.  They next went to multiple JVMs (three) and ran into a similar problem.

                                     

                                    They will occasionally get something similar to the following exception on one of the nodes.

                                     

                                    java.lang.Exception: DES-CACHE Destination cache lacked a key for ChildNode3300

                                            at com.novaordis.thread194629.distrib.ConcurrentWriteReadDistribTestSlaveTest.checkIfDestinationCacheHasAllKeys(ConcurrentWriteReadDistribTestSlaveTest.java:64)

                                            at com.novaordis.thread194629.distrib.ConcurrentWriteReadDistribTestSlaveTest.testReadsOnSlaveOnceDestCacheIsLoadedByMaster(ConcurrentWriteReadDistribTestSlaveTest.java:50)

                                            at com.novaordis.thread194629.distrib.ConcurrentWriteReadDistribTestSlaveTest.run(ConcurrentWriteReadDistribTestSlaveTest.java:135)

                                            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

                                     

                                    I've included two maven projects: c041-svn-master and c041-svn-slave.  If you open three windows at a command line and run one instance of master (mvn test) and two instances of slave (mvn test), you will occasionally get the above error from either the master or a slave node.

                                     

                                    As always, any help is greatly appreciated.

                                     

                                    Thanks

                                     

                                     

                                    RCBJ

                                    1 2 Previous Next