1 2 3 Previous Next 36 Replies Latest reply on Mar 22, 2012 8:04 AM by manik Go to original post
      • 15. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
        galder.zamarreno

        Something else for you to try: add useReplQueue="true" replQueueMaxElements="10000" attributes to <async> element in order to use a replication queue.

        • 16. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
          belaban

          OK, so I ran your test on my laptop, but changed the test slightly (removed unnecessary auto-boxing/unboxing). I also use a repl-queue, as described by Galder.

          I got 24 TXs / millisecond.

          I actually used the latest JGroups and Infinispan, but this didn't make a great diff (ca. 10%).

          • 17. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
            galder.zamarreno

            Btw, 275 tx/ms for ehcache looks way too high. In fact, I was getting a similar number with Infinispan earlier.... when the passive and active node did not cluster

             

            I don't see where in that ehcache configuration is clustering set up.

             

            AFAIK, you either need some kind of JGroups magic that ehcache used to plug in, or terracota magic, which is the clustering voodoo they add these days.

            • 18. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
              belaban

              Yes, 275'000 TXs / sec seem too good to be true. The thing you don't do in your test is to wait for all of the data to arrive at the recipient ("passive"). So if an implementation simply buffers the puts and removes (the gets are local, so no clustering traffic is generated there), your test would finish (if the buffer is large enough to hold all of the puts and removes) without *any* replication happening at all !

               

              I can't speak for ehcache, but in our system, the PUTs are REMOVEs are sent asynchronously (in this config), or even placed into a replication queue (if enabled), and sent whenever 2000 messages have been queued. It is very likely that the passive cache won't have received all updated when your perf test finishes...

               

              You could modify your test to put a special moniker into the cache when all sender threads are done, and poll at the recipient until you see this moniker, and only *then* return and measure the time taken.

              • 19. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                ltepper

                Hi Galder,

                 

                First, thank you for your answers.

                 

                Appologies for the typo, In the lower example, I meant

                 

                which solved me the RejectedExecutionException I had.

                 

                 

                Regarding the ehcache version, it's ehcache-core-2.5.1.jar.

                One more important thing - I've made the ehcache run synchronously by using

                properties

                =

                "replicateAsynchronously=false"

                as a property of cacheEventListenerFactory.

                 

                 

                and the results were 2 transactions / millisecond.

                 

                This explains why in synchronous cache mode I got the memory problem -

                In fact, further tests showed that when the synchronous cache mode test finished, millions of RMI objects were still running,

                The meaning of this is that the 275 transaction / millisecond test was actually misleading and should be ignored.

                Regarding use of TCP in infinispan with JGroup, I actually started with this configuration and got to similar results (a different test setup, but got to 14 transactions / millisecond).

                Galder - with the TCP setup, can you please mention what were the rates you achived?

                Also here is my network configuration:

                =================================================

                [root@smp128 classes]# ethtool eth0
                Settings for eth0:
                        Supported ports: [ TP ]
                        Supported link modes:   10baseT/Half 10baseT/Full
                                                100baseT/Half 100baseT/Full
                                                1000baseT/Full
                        Supports auto-negotiation: Yes
                        Advertised link modes:  10baseT/Half 10baseT/Full
                                                100baseT/Half 100baseT/Full
                                                1000baseT/Full
                        Advertised auto-negotiation: Yes
                        Speed: 1000Mb/s
                        Duplex: Full
                        Port: Twisted Pair
                        PHYAD: 1
                        Transceiver: internal
                        Auto-negotiation: on
                        Supports Wake-on: g
                        Wake-on: g
                        Link detected: yes

                =================================================

                Is it possible that 1000 Mb/s is my bottleneck ?

                Regards,

                Liron Tepper

                <async asyncMarshalling="false"/>

                 

                • 20. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                  ltepper

                  Hi Bela,

                   

                  Thank you very much for your answers.

                   

                  24 trans / sec sounds way better, and may be good enough for us.

                   

                  Can you please send me that code changes you've made?  I would like to try get to these rates.

                   

                  Another question - are you sure that in an your asynchronous setup, the replication queue remains empty in the second that the test ends?

                   

                  Regards,

                  Liron Tepper

                  • 21. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                    belaban

                    With sync replication I get 3 transactions / millisecond in my test. This will naturally always be much slower than async repl

                    • 22. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                      belaban

                      Send me your email address (to belaban at yahoo dot com), and I'll send you the changes. Note that I used Infinispan 5.2 snapshot and JGroups 3.1 snapshot, I can ship those 2 JARs as well if you want.

                       

                      With this setup and <async asyncMarshalling="false" useReplQueue="true" replQueueMaxElement="1000"/>, I get (for 60 threads, each doing 100'000 requests) 37 transactions / millisecond !

                       

                      No, I don't know that the replication queue is empty when the test ends, I actually suspect there are still elements in it, that's why I suggested the additional of an END marker, and taking the stop time only when the END marker is in the passive cache.

                      • 23. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                        ltepper

                        Hi Bela,

                         

                        I've used your code and the new configuration files and the new jars and got improvement to 17 transactions / millisecond.

                        I'm trying to understand what is limiting the performance on my test.

                         

                        Regards,

                        Liron Tepper

                        • 24. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                          belaban

                          Hi Liron,

                           

                          The config I sent you uses UDP. You might want to tune the TCP/IP stack a bit (e.g. increase the net.core.rmem_max buffer) to increase the perf, or try using TCP instead of UDP.

                           

                          For me it's not so important whether you get 17 or 37 TXs/ms, but I wanted to look into the 250+ TX/ms number you got on ehcache, versus the low number on Infinispan; I was concerned about an order of magnitude perf diff between ehcache and Infinispan, but now this seems to be gone. Do you have any recent numbers on your tests with ehcache ?

                           

                          As I mentioned before, async replication gives you the best numbers, but your test doesn't really measure the time to send and receive N modifications; it only measures the time to *send* N modifications, without waiting until they have been received on the passive node. I suggest modifying your test slightly: each sender thread should place an END marker key into the cache, and - when done sending - the test should block until *all* END markers are seen in the passive cache. IMO, measuring the time it takes to *replicate* N items is more meaningful than measuring the time it takes to *send* N items.

                          Cheers,

                          • 25. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                            ltepper

                            Hi Bela,

                             

                            With ehCache I got to only 2 TX / ms on synchronous mode, and on asynchronous mode it has a problem of memory consumption which means the test has failed.

                            I actually did what you've suggested with the END marker and this is how I got the 17 TX / ms. (sent one marker after "join" operation to all threads)

                             

                            Regards,

                            Liron Tepper

                            • 26. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                              belaban

                              Liron Tepper wrote:

                               

                              Hi Bela,

                               

                              With ehCache I got to only 2 TX / ms on synchronous mode, and on asynchronous mode it has a problem of memory consumption which means the test has failed.

                              I actually did what you've suggested with the END marker and this is how I got the 17 TX / ms. (sent one marker after "join" operation to all threads)

                               

                              OK, now you should run Infinispan in synchronous mode as well (add <sync/> and remove <async ... />), and see what numbers you're getting. I'd say this should be about the same as ehcache, maybe a bit higher. In real life, in your app you probably want to group the modifications made to the cache together into a real (JTA) transaction.

                               

                              I see, with an END marker, you'll probably get less performance because you now include the time to apply all modifications, and that takes longer than just measuring the time to send the modifications. This would also be lower than 37 TX/ms on my machine. The diff shouldn't be framatic though, as my repl queue was only 1000, so I would have had to subtract the time it takes to deliver 1000 modifications (tops) to the application.

                               

                              Re ehcache: you might run out of memory because they're buffering modifications (similar to our replQueue) and only flush the queue every now and then. You might be able to set this interval, thus preventing the OOME. I'm not an expert though, consult the ehcache forums for details  on how to do this.

                              • 27. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                                ltepper

                                Hi Bela,

                                 

                                Thanks for your answer.

                                I have another question for you.

                                 

                                Is it possible to use a setup in which every machine run two instances of infinispan server? (total of two machines)

                                If true, will the rate now be 17 tx / ms in each of the instances, or will it be 8.5 tx / sec (total rate of 17)?

                                This will double the performance for us if possible.

                                 

                                 

                                Regards,

                                Liron Tepper

                                • 28. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                                  belaban

                                  You can run as many instances as you want, as long as the instances form a cluster. So, for example, you could run 2 passive instances on one box, and 1 passive and 1 active on another box.

                                   

                                  Running 1 passive / 1active instances, I get 35 TXs / ms.

                                   

                                  Running 3 passive / 1 active instances, I get 28 TXs / ms.

                                   

                                  Remmber, we're using replication, so

                                  - every key is stored in every instances (also the active cache)

                                  - every modification triggers a message send to all nodes in the cluster

                                   

                                  The first item makes scalability a function of the average data size and the cluster size. For example, if you have an average data size of 100MB, and 10 nodes, then every node will have to store roughly 1GB of data. As you can see, this is not very scalable as either the cluster size increases, or the data size increases. However, reads are very fast, as they are always local (no network round trip).

                                  • 29. Re: InfiniSpan 5.1.0CR2 with Jgroups 3.0.1 performance
                                    ltepper

                                    Hi Bela,

                                     

                                    Thank you for your reply.

                                     

                                    I might not have explained myself so well.

                                    What I actually meant was forming 2 different clusters, each cluster holding different cache instances.

                                     

                                    host A                      host B

                                    =====                      =====

                                                   cluster 1

                                    passive1 <------------>  active1

                                     

                                     

                                                   cluster 2

                                    passive2 <------------>  active2

                                     

                                     

                                    Note that passive1 and passive2 are different java processes.

                                     

                                     

                                    I have just managed to run such a setup, by using seperate configuration files for each cluster. (with different ports etc.)

                                    The results were very good - got actual rate of 16 tx / ms in each cluster !

                                     

                                    Is that sounds possible?  I wonder what is the bottleneck of a single cluster.

                                     

                                    Regards,

                                    Liron Tepper