6 Replies Latest reply on Jul 24, 2012 2:23 AM by jfclere

    routing table lookup performance

    asyomichev

      I recently noticed that as the population of instances behind mod_cluster grows, request routing takes significantly longer time. To give a rough order of magnitude, I am talking about latencies on the order of 100-200ms for pools of about 200+ workers. Parallel requests still scale well (i.e. several request loops running in parallel show the same request rate as a single one), but latency of 200 ms translates into a single-client throughput of 5 HTTP transactions per second, which is alarmingly low.

       

      After some digging it became apparent that the time is mostly spent accessing host and context information in the routing table in shared memory. I am talking about host_storage->read_host() and context_storage->read_context() in get_balancer_by_node() (mod_proxy_cluster.c) It looks like reading from shared memory is quite slow and on top of that it is done repeatedly (something close to O(n^2) to the number of  entries).

       

      Has someone observed it before? Is the anything in the works to improve?

       

      My environment: linux 2.6.18 x86_64, httpd 2.2.15, mod_cluster 1.1.0.Final, Tomcat 6.0.20

       

      Thanks,
      --Alexey

        • 1. Re: routing table lookup performance
          jfclere

          Could you try 1.1.3? it should bring some improvements.

          • 2. Re: routing table lookup performance
            asyomichev

            Just tried with a fresh build of 1.1.3, but unfortunately the results are very similar. I've cooked up a patch on top of 1.1.0.Final (attached) that seems to improve things by caching host and context entries in the heap. I am seeing up to 10x improvement in certain cases. Could you please take a look if the approach makes sense and it would be meaningful to port it to 1.1.3? (the patch is not applicable to 1.1.3 directly due to significant rework in mod_proxy_cluster.c)

            • 3. Re: routing table lookup performance
              jfclere

              I have created a JIRA and I will integrate the patch in the next version. Many thanks.

              • 4. Re: routing table lookup performance
                asyomichev

                Hi Jean-Frederic,

                 

                I recently upgraded to 1.2.0.Final with the patch included, and found that the lookup performance is still not quite where expected: with 200 instances I still see over 90ms of latency per http transaction on average.

                 

                Looking at mod_proxy_cluster.c in 1.2.0.Final I noticed that worker lookup now touches "node" structures within a tight loop and it is stilll coming from the shared memory. I took the liberty of mirroring the caching approach already applied to "context" and "host", and that brought the latency down to about 10ms in that test. What is more important for scalability, it became less dependent on the table size.

                 

                I was wondering if you could consider adding the extra caching (proxy_node_table/read_node_table) to the upstream of mod_cluster for further releases?

                 

                Thanks,
                --Alexey

                • 5. Re: routing table lookup performance
                  jfclere

                  Adding extra caching for the node looks a good idea. Create a JIRA and submit a patch ;-)

                  • 6. Re: routing table lookup performance
                    jfclere

                    the initial JIRA was MODCLUSTER-252.