1 2 3 Previous Next 42 Replies Latest reply on Jul 13, 2012 12:58 PM by darrellburgan

    Infinispan scalability consulting

    darrellburgan

      Sorry if this isn't the right forum, but I don't know where else to ask.

       

      My employer uses Infinispan as the data grid for our flagship SaaS application, which has heavy volume and tons of concurrent requests going on 24x7. We have had great success with Infinispan since we deployed it, but our volume is growing exponentially and we are running into some scalability walls with Infinispan:

       

      • Locking is increasing slowing down the performance of the system
      • Sometimes locks get placed on entries that never clear, bringing the system to a screeching halt
      • Sometimes under heavy volume the JGroups cluster gets confused, with individual nodes listed multiple times and other strange behaviors
      • Other volume-related problems


      We have instituted a number of changes to try to lighten our load on Infinispan, and have made some headway, but it is clear that there is a limit to how far we can push this architecture.

       

      As a result, we are looking for some individual or company who has deep Infinispan experience who can advise us as to how we can redesign our use of Infinispan or help us tune Infinispan to get us past this scalability hurdle. We've tried looking around on the net but finding people who truly understand Infinispan more than we already do seems to be hard.

       

      My question is: can anyone suggest who we can talk to to get this kind of consulting? Is this something we can purchase from Red Hat? Are there consulting organizations out there with specific Infinispan practices? Do the Infinispan developers know people who can provide this type of service?

       

      Like I said, if I'm in the wrong forum, please let me know the more appropriate place, but this forum is where all the Infinispan experts obviously are. If you want to contact me offline, you can email me at dburgan@peopleanswers.com.

       

      Thanks,

      Darrell

        • 1. Re: Infinispan scalability consulting
          sannegrinovero
          Locking is increasing slowing down the performance of the system

          Could you provide some more details on this? Ideally, if you could open a separate thread for each point.

           

          • Sometimes locks get placed on entries that never clear, bringing the system to a screeching halt

          That's definitely not supposed to happen. Which version is it? Can you identify and describe how they get there? I'm not aware of such problems on latest stable (5.1.5.FINAL), it would be great if you could test that one.

           

           

          • Sometimes under heavy volume the JGroups cluster gets confused, with individual nodes listed multiple times and other strange behaviors

          For this, we'd need to know which protocols you are using (JGroups configuration + version)

           

           


          We have instituted a number of changes to try to lighten our load on Infinispan, and have made some headway, but it is clear that there is a limit to how far we can push this architecture.

           

          Some of our recently run internal tests was showing a very impressive completely linear scalability, we where quite happy. Of course you are likely running a different configuration and different kind of operations, so it would be very usefull for us to get an accurate enough description to be able to reproduce such a limit.. unless we're talking about gazillions of data/nodes? If you could provide some details that would be great, also as reference for other users. (if you can not we understand, then consulting is the only way).

           

           

          My question is: can anyone suggest who we can talk to to get this kind of consulting? Is this something we can purchase from Red Hat? Are there consulting organizations out there with specific Infinispan practices? Do the Infinispan developers know people who can provide this type of service?

          Yes to all of your questions. I'm going to point out this forum thread to the appropriate persons in Red Hat, but in the meantime you can get some free help from us directly.

          • 2. Re: Infinispan scalability consulting
            prabhat.jha

            To answer your main question, you should be able to get support and consulting from http://www.redhat.com/contact/sales.html . You can send an email to me at pjhaATredhatDOTcom and I will put you in touch with right person.

             

            Back to technical clarifications on top of what Sanne asked above:

            Which version of Infinispan are you using?

            What's your run time? Any app server?

            How big is your cluster? How are you interacting with data grid?

            Can you upload your configuration file?

             

            I would definitely suggest you to try with Infinispan 5.1.5 which has lots of perf related improvements sincie ISPN 5.1 release and it's been proven by our internal benchmarking.

            • 3. Re: Infinispan scalability consulting
              darrellburgan

              Some background: we have five nodes in our Infinispan cluster, each with an identical structural view of the list of caches. There are at least a few hundred caches. This version of our system does not use Hibernate; it uses an in-house ORM that we have developed. We have created an elaborate scheme for caching data from each table in our system, which caches both by primary key as well as by a table-specific list of secondary keys. Correlation of data between these caches is handled by our custom ORM. At peak volumes, we can be generating several hundred database updates per second, which translates into approximately the same amount of traffic for Infinispan. Average volumes is more in the 80-100 updates per second range.

               

              Each cache has a different potential configuration. In general, however, most caches are either async invalidation caches or async replication caches, depending on how often the table changes. Tables that change often are invalidation; tables that are primarily read-only are replicated. We do not currently use distribution caches. Nor do we use the infinispan replication queue or async marshalling.

               

              We allow our caches to grow pretty large, with max cache sizes usually in the tens of thousands of entries. We do not put a max lifetime on our cache entries, but we do use idle eviction on most caches, usually in the range of a few hours.

               

              Each cache detects lock timeouts and retries a few times before giving up. It used to be that lock timeouts would occur rarely so this was no big deal. This is no longer true.

               

              We're using Infinispan 5.0.1.FINAL in production. We are using JGroups to form the cluster, using TCPGOSSIP as the primary protocol. We are in process of moving to UDP multicast on the theory that some of our instability under load may be due to TCP timeouts or other aspects of the network beyond JGroups' control.

               

              Here's what we are observing:

               

              1. We are increasingly having trouble with garbage collection affecting Infinispan. Once per hour a system-wide gc occurs, which can cause a JVM to go dark for 10-15 seconds sometimes. We've observed that if we are under heavy load when this happens, it can cause JGroups to have trouble. Sometimes it recovers successfully after the gc completes, other times it loses track of the node that had the gc, and still other times we end up with multiple instances of that node in the JGroups cluster.
              2. We have noticed that cache update times are increasing under load. By cache update time, we mean the time a thread blocks to do a cache.delete() or a cache.put() - not the time it takes for the rest of the cluster to see the change. We have clocked some of these cache.put() calls at over 15 seconds, same for cache.delete().
              3. When the cach updates start taking so long, we've noticed that lock timeouts seem to pile up, causing delays in the foreground. This can quickly snowball into a system-wide slowdown if the lock timeouts reach a critical mass. We've instituted some code that detects when cache updates cross a certain threshold, we stop doing cache.put() calls altogether for a period of time to try to give the data grid time to catch up. This seems to help but is a band-aid on the problem.
              4. I do have a load test environment where I can reproduce this behavior of caches taking longer and longer to update, and locks contention becoming serious. I've tried upgrading to Infinispan 5.1.5 in my test environment, and I have not observed any improvement in performance, although I have observed a change in behavior: at some point Infinispan simply stops trying to accomodate the load and starts throwing RejectedExecutionException. I don't know if this is new in Infinispan 5.1 or not, but I'm not sure it helps us to have Infinispan throw v. simply get increasingly slower.
              5. We have had several instances, again under heavy load, where a lock seems to be placed on some heavily-referred-to cache entry and never gets released. This effectively causes a system outage, because all threads dependent on this cache entry block. I have not been able to reproduce this in a test environment, so it is very difficult to help diagnose what happened.
              6. I am in process of attempting to stand up ehCache in my test environment as the provider of caching to our data grid, but don't have any results yet. I am not wanting to switch to a different caching engine, but due diligence requires me to at least compare how the two systems perform in this regard.
              7. We have deployed several tweaks and tunings in an effort to put some more space between us and the edge of the cliff. One thing we did was the aforementioned code to stop doing cache.put() if the update times cross a threshold. Another change is to improve how our primary/secondary cache correlation works in our custom ORM, to reduce the amount of Infinispan traffic we're generating. We have also tweaked timeouts and changed some of the cache configurations for specific tables, in an effort to reduce Infinispan load.

               

              In short, we have a complex Infinispan deployment that has served us very well for many months, but is increasingly having trouble with handling the load of our system, as our volume grows exponentially. The problem has grown to the point that it is measurably impacting our customers, so we have to act. Like I said, we've tried to tune things, which has helped some, but we are basically staving off the inevitable. We expect our volume to grow up to 100% in the next six months, and are very concerned about whether the data grid can handle it.

               

              Ideally, we'd want someone who is a true, seasoned expert on Infinispan to consult with us, examine how we're using Infinspan, and make recommendations as to how we can tactically survive the next six months, as well as strategically how we should evolve our Infinispan architecture so we can handle 2013 and beyond. I have several months experience with Infinispan, but my knowledge barely scratches the surface. Put simply, I need help.

               

              Thanks,

              Darrell

              • 4. Re: Infinispan scalability consulting
                darrellburgan

                Sorry, I should add:

                 

                - The version of the system I'm discussing uses Java 6.

                - The 'app server' is Tomcat 5.5

                • 5. Re: Infinispan scalability consulting
                  sannegrinovero

                  Hi Darrell,

                  you should definitely try out 5.1.5.FINAL : many of the problems you mention have been analyzed in the past months and we introduced a lot of improvements, and most of them are related with the symptoms you are describing.

                   

                  I have referred you to our sales and they will get in touch soon, but in the meantime I think you should give a try to 5.1.5 in your load tests, I'm confident you will see a mayor improvement already.

                  • 6. Re: Infinispan scalability consulting
                    belaban

                    Re #1:

                     

                    When you have a 10-15s GC pause, JGroups will start suspecting other nodes; and then - when GC has completed - merge the nodes back together. While this is OK from a cluster topology perspective, it causes problems for Infinispan. In a replicated cache, Infinispan performs replication by doing blocking time-bounded RPCs, e.g. a PUT(K,V). If you have a cluster {A,B,C,D,E}, and a modification made in A needs to be replicated, and E is non-responsive because of  a GC taking place, the RPC will not succeed at E, or at least it will be delayed updating E.

                     

                    A non-responsive node can cause all sorts of issues, for example message stability (= purging of messages delivered by all nodes) won't be able to progress until the non-responsive node recovers or is excluded. Until the non-responsive node is excluded (or recovers), messages will be not be purged, leading to a memory increase.

                     

                    Excluding of a non-responsive node is done through failure detection in JGroups, which is configurable. A lower timeout means that a node will get excluded sooner, but this also means a subsequent merge if that node was still alive (like in your case).

                     

                    So you can define exactly how you like JGroups react to non-responsive nodes, however, I suggest look into why a node becomes non-responsive in the first place, and try to reduce the long full GC cycles in favor or more, shorter, GC cycles. Have you looked into the G1 collector, or even the Azul Zing JVM ? They promise short GC cycles...

                    • 7. Re: Infinispan scalability consulting
                      nadirx

                      A few more bits about GC: you don't mention what GC strategies you are using, but may I suggest using the Parallel Collector only for new generations and the CMS collector for old generations:

                       

                      -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled

                      • 8. Re: Infinispan scalability consulting
                        darrellburgan

                        Tristan Tarrant wrote:

                         

                        A few more bits about GC: you don't mention what GC strategies you are using, but may I suggest using the Parallel Collector only for new generations and the CMS collector for old generations:

                         

                        -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled

                         

                        Yeah we're investigating this as well as putting a limit on the max gc duration and some other tweaks to the size of generations. Certainly long gc pauses can trigger a snowball effect that sends our data grid into the ground. But like I said, I'm able to reproduce a lock-contention logjam in my load test environment without any extensive gc going on, purely by throwing enough load at Infinispan. So I'm also very concerned about the medium-term viability of our data grid architecture. I would not be surprised if our data-tier transaction volume doubles in the next six months ...

                        • 9. Re: Infinispan scalability consulting
                          darrellburgan

                          When you have a 10-15s GC pause, JGroups will start suspecting other nodes; and then - when GC has completed - merge the nodes back together. While this is OK from a cluster topology perspective, it causes problems for Infinispan. In a replicated cache, Infinispan performs replication by doing blocking time-bounded RPCs, e.g. a PUT(K,V). If you have a cluster {A,B,C,D,E}, and a modification made in A needs to be replicated, and E is non-responsive because of  a GC taking place, the RPC will not succeed at E, or at least it will be delayed updating E.

                           

                          A non-responsive node can cause all sorts of issues, for example message stability (= purging of messages delivered by all nodes) won't be able to progress until the non-responsive node recovers or is excluded. Until the non-responsive node is excluded (or recovers), messages will be not be purged, leading to a memory increase.

                           

                          Excluding of a non-responsive node is done through failure detection in JGroups, which is configurable. A lower timeout means that a node will get excluded sooner, but this also means a subsequent merge if that node was still alive (like in your case).

                           

                          So you can define exactly how you like JGroups react to non-responsive nodes, however, I suggest look into why a node becomes non-responsive in the first place, and try to reduce the long full GC cycles in favor or more, shorter, GC cycles. Have you looked into the G1 collector, or even the Azul Zing JVM ? They promise short GC cycles...

                           

                          It is definitely true that long gc pauses can throw our data grid over the cliff, so we are working very hard to reduce our gc pause durations. We've made a number of improvements in this regard, and are looking into the possibility of more advanced garbage collectors, perhaps the G1 or perhaps even an alternative JVM implementation. The Azul JVM seems almost too good to be true, but we're looking into it.

                           

                          The thing is, I can also push our data grid over the cliff in my load test environment without gc being a factor. The load test runs on a single (fairly big) machine with multiple JVMs talking to each other via JGroups through shared memory TCP sockets. When I ramp the volume up, I see increasingly slower and slower cache update times, increasing cache lock contention, ultimately resulting in complete cache dysfunction. Now I know that any system will eventually fail if you pushit past its design limits, but my concern is that our production volume is increasing at at least a geometric pace. I'm pretty sure we will see 2X the volume we have today within 6 months. So I am very concerned that our data grid architecture itself is not going to be able to keep up.

                           

                          Ideally, I'd like to see Infinispan have a mode where it can operate lockless, i.e. support isolation level NONE or at least READ_UNCOMMITTED. Our old custom ORM doesn't have support for transactioning, so ACID consistency is not achievable anyway. Consequently I'm willing to sacrifice cache consistency for performance, and code around the cache consistency issues at the application level. If I have to I might even consider taking away Infinispan from our datagrid and reimplementing it using simple invalidation caches, Google MapMaker as the concurrent local cache, and RabbitMQ to message the invalidations. I'd really rather avoid reinventing that wheel though. I'm committed to Infinispan as long as I can tune it to handle the volume.

                           

                          Is there a guide to tuning and scaling Infinispan anywhere? That would be an extremely useful book, if someone has written such a thing ....

                          • 10. Re: Infinispan scalability consulting
                            dex80526

                            This kind discussion about performance turning (and tips) and implications are really appreciated. In short, we really need some guid line and tips on performance tunning, and configuration for different scaling/performance configurations.

                            We ran into similar issues in our environment. I posted the related isssues before.  In our environment, we need to support upper to couple million cache entries in a replication cluster using TCP. The trasnaction rate is upper to 4 to 5 hundreds per second.  When observed when the cluster/infinispan/jgroups start to through tranaction timeout or failed to prepare views, the whole culuster is not really functioning, and the worse is the culster will not be able to recover from this even the load is stopped (We have to restart application). The exact errors/exceptions were posted before. In one cluster, we got OutofMemeoryError.  To me, Infinispan nees to handle the no-responsive node better (at least to avoid bringing the whole cluster down). It should lead to whole cluster not function.  Note: I have not absolutly identified the root cause of OutofMemoryError in our case yet. But, after read Bela's post, I believe that was going on in our case. 

                            • 11. Re: Infinispan scalability consulting
                              darrellburgan

                              Sanne Grinovero wrote:

                               

                              Hi Darrell,

                              you should definitely try out 5.1.5.FINAL : many of the problems you mention have been analyzed in the past months and we introduced a lot of improvements, and most of them are related with the symptoms you are describing.

                               

                              I have referred you to our sales and they will get in touch soon, but in the meantime I think you should give a try to 5.1.5 in your load tests, I'm confident you will see a mayor improvement already.

                               

                              Actually, I've already tried 5.1.5.FINAL in my load test environment. My unscientific impression was that it has approximately the same scalability curve as 5.0. Along with a very big change in behavior: when 5.0 gets bogged down, it simply starts responding slower and slower. When 5.1 gets bogged down, however, it starts throwing RejectedExecutionException. To be honest, I'm not sure if this changed behavior is better or worse.

                               

                              What are the specific things 5.1.5 did to address lock contention and scalability? Maybe I need to retune my data grid for 5.1.5 to see the benefits?

                              • 12. Re: Infinispan scalability consulting
                                belaban

                                I'll let the Infinispan folks reply to the part about lockless operation...

                                 

                                You mentioned that you use either async invalidation or  replication. If you use replication, you're never going to scale to a large cluster or large data set, as the heap size used is a function of the cluster size and the average data set size of every node. If you have 10 nodes, and every node has ca. 100MB of data, then every node needs to have 1GB of free heap to accommodate the data. Plus some more for Infinispan and JGroups metadata etc.

                                 

                                Of course, you could always use a cache loader to overflow data to disk, but this will slow down your system (depending on the cache hit/miss ratio).

                                 

                                We found that DIST scales almost linearly; the cost for a read or write is always constant, regardless of the cluster size. I'm going to talk about this at JBossWorld in 3 weeks. I don't know your architecture, but DIST (perhaps in combination with an L1 cache) could be some thing worth looking into...

                                • 13. Re: Infinispan scalability consulting
                                  belaban

                                  When you mention that your production volume is going up, are you referring to the number of entries in the cache, or the number of accesses to them, or both ?

                                   

                                  Let me stress again, that replication is not a good solution if either one or both or them increase.

                                   

                                  Re using TCP: I don't recommend TCP with replication if your cluster size increases, as a message M to the entire cluster (of N nodes) will have to be sent N-1 times. So for a cluster of 10, everyone sending a message leads to 81 message being on the network at the same time. Even worse, we'll have a mesh of 81 TCP connections. For replication, I definitely recommend UDP over TCP.

                                   

                                  If you use distribution (DIST), then either UDP or TCP can be used, as we're sending fewer messages (only to the backups, not the entire cluster).

                                  • 14. Re: Infinispan scalability consulting
                                    dex80526

                                    We saw scaling issues in our load testing. I'll post it in a seprate thread in detail late. But, for people following this thread, the issue we see could be summrized this:

                                     

                                    We have 2 node cluster in replcation (in 2 node cluster "DIST" should be similar to "REPLICATION"), and have dozen loader drivers (similar to JMETER) to drive the requests to the cluster through a load balancer.

                                     

                                    We found the performance is singnificat lower in a 2 node cluster configuration than a single node with the same physical capacity. We saw no errors/exceptions in single node run, but saw Transaction timeout.

                                     

                                    I am going to do the same test with data  replication disabled to see if the data replication is  the bottle neck. But, to me the big suprise is that the 2 node cluster can not even perform at the same level as the single node.

                                    1 2 3 Previous Next