7 Replies Latest reply on Jun 26, 2009 12:02 PM by peterj

    About the UseParallelOldGC

      I am running a cluster JBoss servers.
      Both server has 4 CPUs, dual cores. i.e. 8 logical CPUs.

      I want to tune the JVM setting and I found that I can use the following parameters:

      +UseParallelOldGC -XX:ParallelGCThreads

      I did some load test with different number of ParallelGCThreads.
      However, I was surprised from the load test result that, the Full GC time for the UseParallelOldGC is much longer than I didn't use it. E.g.
      A) Parallel Old GC = 7
      4.1747870 secs

      B) Parallel Old GC = 3
      4.2135090 secs

      C) No Parallel Old GC
      2.7176180 secs

      Does anyone know any reason why the Parallel GC can perform poor?

        • 1. Re: About the UseParallelOldGC
          peterj

          I have found the same strange behavior - while running parallel gc threads for minor collections reduces the gc pause time, running parallel gc threads for major collections increases the pause time. It makes no sense, but there you have it.

          You really have to try various heap and gc settings and find the best settings for your app. Have yo considered the CMS collector? With the number of CPUs you have it might be a good option.

          See this presentation:
          http://www.cecmg.de/doc/tagung_2007/agenda07/24-mai/2b3-peter-johnson/index.html

          • 2. Re: About the UseParallelOldGC

            Is there any possible reason why the multi-thread ParallelOldGC is running poor than the single thread one?

            • 3. Re: About the UseParallelOldGC
              peterj

              Lock contention over common object, such as the free memory list? I really have not had time to investigate it.

              • 4. Re: About the UseParallelOldGC

                Thank you Peter. Can I say?

                The problem typically happens when there are too many parallelOldGC
                threads in the process and there is too small an old generation. This
                results in excessive work stealing between the GC threads and this
                work stealing bangs on a lock. Too many ParallelOldGC threads without
                enough old space to carve up between them result in this work stealing
                pathology.

                • 5. Re: About the UseParallelOldGC
                  peterj

                  I have a quad-core and for my testing I used a 1GB heap (I did not specify a young gen size, but I believe the JVM never set it to more than 100M). When using multiple tenured GC threads the JVM splits the tenured generation into sections and lets each thread clean its own section to minimize contention. So I had 4 thread cleaning about 200MB each. You, or course, had 8 threads so your lock-contention is higher. But I read a very interesting paper the other day regarding cache coherency between L2 caches in the CPUs that caused a significant performance drop when running a multi-threaded app, so I'm wondering if that could be a reason. Of course, I'd need VTune to track that down.

                  • 6. Re: About the UseParallelOldGC

                    I want to understand more about the point 'lock contention' in the free memory list.

                    In your case, you have 1GB Heap, with 4 cores. So, assume your young gen size is 100MB, the old gen is around 900MB. So each core will share 900MB / 4 = around 225MB.

                    If I can use 8 cores, each core will share 113MB.

                    Lock contention occurs because each thread is working on 'Too few' old gen size?

                    • 7. Re: About the UseParallelOldGC
                      peterj

                      You'll notice the question mark after my statement about the free memory list. That means I don't know, I am just guessing and my guess could be completely off. I also stated that I had not had time to look into why the parallel old GC runs slow. So asking me to explain it is futile because I have no answers. As I stated earlier, the best thing you can do is try several different GC mechanisms and use the one that works best for you. If you are really concerned about the parallel old GC performance, you should take that up with Sun, after all, it's their JVM and their code.