1 2 Previous Next 27 Replies Latest reply on Aug 17, 2010 2:59 PM by shane_dev Go to original post
      • 15. Re: State Transfer/Repl Queue - Out of Order
        shane_dev

        I just emailed you the logs. We were at CR2, but we may have reverted to CR1 for a couple of tests. I'll check with Craig on that. Also, I'll look into getting the latest snapshot up and scheduling a test today against it.

         

        Thanks,

        Shane

        • 16. Re: State Transfer/Repl Queue - Out of Order
          shane_dev

          We ran our tests with the latest snapshot, but we saw the same results. The issue is easily repeatable (on our end) and quite identifiable. Essentially, we are seeing replication queue flushes processed out of order on the receiving end. This usually happens with flushes that are sent before a state transfer begins and received after it finishes. However, that is not always the case. Here is a quick summary of our last test. I have also attached the clean logs. I'll look into emailing you the location of the full logs.

           

          Slave:  11:36:13 - State Transfer
          Master: 11:36:13 - State Transfer

           

          // These 3 are after this current state transfer, but before the next.

          Master: 11:37:30 - PUT (342224477717137, Request - 2220)
          Master: 11:37:30 - REMOVE (342224477717137, Request - 2219)

           

          Master: 11:37:48 - PUT (342224477738570, Request - 2278)
          Master: 11:37:48 - REMOVE (342224477738570, Request - 2279)

           

          Master: 11:37:59 - PUT (342224477751302, Request - 2317)
          Master: 11:37:59 - REMOVE (342224477751302, Request - 2318)

           

          Slave:  11:39:34 - State Transfer
          Master: 11:39:34 - State Transfer

           

          // Now we process the 3 flushes from before the state transfer. However, there are out of order in that we are now processing the REMOVEs before the PUTs.

          Slave:  11:39:55 - REMOVE (342224477717137, Request - 2219)
          Slave:  11:39:55 - PUT (342224477717137, Request - 2220)

           

          Slave:  11:39:55 - (342224477738570, Request - 2279)
          Slave:  11:39:55 - PUT (342224477738570, Request - 2278)

           

          Slave:  11:39:56 - REMOVE (342224477751302, Request - 2318)
          Slave:  11:39:56 - PUT (342224477751302, Request - 2317)

           

          // This one seems to not be related to a state transfer at all.

          Master: 11:39:57 - PUT (342224477893092 & 342224477893095, Request - 2712)
          Master: 11:39:57 - REMOVE (342224477893092 & 342224477893095, Request - 2713)

           

          // Again, it processes the REMOVE before the PUT.

          Slave:  11:40:10 - REMOVE (342224477893092 & 342224477893095, Request - 2713)
          Slave:  11:40:10 - PUT (342224477893092 & 342224477893095, Request - 2712)

          • 17. Re: State Transfer/Repl Queue - Out of Order
            shane_dev

            I'm wondering if this might be related to the threading of the transport executor. We have set asyncMarshalling to false for all of our caches. However, from the logs it appears that the executor service is still being used. I thought that if async marshalling was not used, the executor service was not used?

             

            I think I may have found the answer to my own question. It looks like the replication queue forces ResponseMode.ASYNCHRONOUS despite the fact that we have set async marshalling to false.

             

            Looks like we need to reduce the threads to 1.

            • 18. Re: State Transfer/Repl Queue - Out of Order
              galder.zamarreno

              Shane, indeed the pool size is having an effect here that I need to investigate further. My tests have all along used maxThreads=1 for the asyncTransportExecutor. Once I upped it to 25, the latest attached test to https://jira.jboss.org/browse/ISPN-577 started to fail indicating that the final result was not an empty cache.

              • 19. Re: State Transfer/Repl Queue - Out of Order
                galder.zamarreno

                ReplicationQueue.flush() uses ResponseMode.ASYNCHRONOUS regardless of whether asyncMarshalling is turned off or not, and that might be the cause of the multiple thread transport issue. For the time being carry on testing with maxThreads=1 while I investigate this further.

                • 20. Re: State Transfer/Repl Queue - Out of Order
                  galder.zamarreno

                  A fix has been comitted. We're working on uploading a snapshot asap.

                  • 21. Re: State Transfer/Repl Queue - Out of Order
                    galder.zamarreno

                    Snapshot is up now. Please try with latest snapshot and let us know how it goes.

                    • 22. Re: State Transfer/Repl Queue - Out of Order
                      shane_dev

                      Thanks Galder,

                       

                      We'll get the snapshot and run our tests Monday morning. We'll let you know. I was also thinking of setting the threads to 1 after looking at the queue code. I'll try with the snapshot first though. If that works, I'll leaves the threads as they are.

                       

                      Shane

                      • 23. Re: State Transfer/Repl Queue - Out of Order
                        shane_dev

                        As I suspected, setting the threads to 1 did fix our issue. We tried with the latest snapshot, but it didn't seem to have the intended affect. I still saw it running in pure async. We'll double check our configs and add some custom logging to see if we can't find out why the new code didn't work. I did see the updates to ResponseMode and they looked good to me.

                        • 24. Re: State Transfer/Repl Queue - Out of Order
                          galder.zamarreno

                          As long as you have async marshalling turned off, it should behave in exactly the same way as with maxThreads=1. It'd be interesting to get more information if you get the chance the dig this further.

                          • 25. Re: State Transfer/Repl Queue - Out of Order
                            cbo_

                            Galder,

                             

                            Sorry, the latest test was not run correctly.  We are planning a re-run here this morning and will present those results.  We are anticipating it will indeed work, but will post shortly.

                             

                            Thanks,

                            Craig

                            • 26. Re: State Transfer/Repl Queue - Out of Order
                              galder.zamarreno

                              I've compiled some of the information in this thread http://community.jboss.org/message/554177#554177 and into http://community.jboss.org/docs/DOC-15725. Any extra information that can be added based of further testing would be of great help.

                              • 27. Re: State Transfer/Repl Queue - Out of Order
                                shane_dev

                                We ran a few more tests today. Everything works fine with the snapshot.

                                 

                                However, we ran into performance issues during a state transfer when asyncMarshalling was set to false. This was expected. We saw a lot of 'adds' initiate 'flushes' and they essentially held up additional operations.

                                 

                                Things went much better using asyncMarshalling and setting the maxThreads to 1.

                                 

                                In both cases, both caches were consistent. We simply had better performance in the later.

                                1 2 Previous Next