11 Replies Latest reply on Jan 8, 2010 11:27 AM by peterj

    CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.

      I am using Jboss application server 4.2.2 GA, in cluster mode on sun solaris machine.There are two nodes running on each server, and we have two servers.

      I am noticing that CPU usage increases everyday by 4%, and after 10 to 12 days it reaches to 100%. Thread dump was fine, as no deadlock, threads waiting for connection, etc.

       

      The only Issue I can think of, I am starting Jboss using nohup ./startjboss.sh>node1 & command, and this node1.out file increases daily.

      I tried to reduce the size of file, by using to :>node1.out. This decreases the size of file, but soon after file returns to its original size.

      I know  that Jboss gets handle of that file, and then keep on writing on it. How to limit this size.

       

      Any other suggestion to fix the issue would be appreciated.

       

      Regards,

      Navkalp

        • 1. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.

          I am attaching thread dump.

          • 2. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.
            peterj

            Did you look at the thread dump? Did you notice anything unusual? I'll give you a hint - look only at the threads that contain your code (ignore the threads that are waiting on sockets ("at java.lang.Object.wait(Native Method)" or "at java.net.SocketInputStream.socketRead0(Native Method)")). I counted 48 threads all stuck at the same location...

            1 of 1 people found this helpful
            • 3. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.

              Thanks very much for your response. Seems GC is causing the issue. We updated RAM but did not change the configuration for the GC.

              Am I right, or missing something?

              • 4. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.
                peterj

                You presented no evidience whatsoever that GC is the cause of the problem. And besides, adding RAM will not solve a GC problem.

                 

                I'll go back to my original point - what did you learn by examining the thread dump? And another hint: what did you find when examining the code highlightred by the thread dump?

                1 of 1 people found this helpful
                • 5. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.

                  Thanks very much for your reply.

                  I don't have much idea about thread dump, but trying to learn it.

                  What I see that these GC threads are 48 in numbers, and I dont see address range(No Address range message). The other code where thread is trying to call hashmap and populating some menu, is not showing any sign of probelms, it generally comes when user logs in and sees menu.

                   

                  This menu code was present long back, and did not create the issues when we did performance test.

                   

                  What I meant by adding RAM was, intially we were having some memory issues, as we aere running two server nodes with 8GB.

                  We increased it to 16GB,  After this increment, issue of high CPU usage started coming in.

                   

                  I am not sure what this no address in range means. While other threads have some address range.

                   

                  • When I see threads in object.wait() and founf those thread are threads waiting for user connections, and like that.
                  • The runnable threads runnable ate socketInputstream also not showing any strange behaviour.
                  • The runnable threads which are populating Menu, althoguh taking long time, but I am not suspecting them as culprit, as same code is there since long, and when we trid to do performance testing we were not having CPU 100% issue(Those tests use to run for 24 hours to 48 hours with 250 User simulations). Since then this code is same so I dont think it as culprit.
                  • Except these GC thread which are showing no address in range none other is showing issues.

                   

                   

                  Please let me if I am not if I am not on the right page.

                  • 6. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.
                    peterj

                    I see the 48 GC threads, but I doubt that they are running and thus not the cause of the problem. (You would need to turn on GC monitoring to be sure.) How many cores/CPUs do you have on the system? What command line options are you using? JVM 5 and later will automatically assign one GC thread per core (for the first 8 cores, with 5 GC threads for every 8 cores after that).

                     

                    The 48 threads I wanted you to look at were the ones that were stuck in HashMap.put(). They all seem to have the same, or similar. stack trace:

                     

                    at java.util.HashMap.put(HashMap.java:420)
                    at com.pb.e2.present.model.MenuModel.buildInternalList(Unknown Source)
                    at com.pb.e2.present.model.MenuModel.refresh(Unknown Source)
                    at com.pb.e2.present.model.MenuModel.refresh(Unknown Source)
                    at com.pb.e2.present.actions.CustomActionBase.getCMSData(CustomActionBase.java:155)
                    at com.pb.e2.present.actions.CustomActionBase.prepare(CustomActionBase.java:46)
                    at com.opensymphony.xwork2.interceptor.PrepareInterceptor.doIntercept(PrepareInterceptor.java:118)

                    . . .

                     

                    Since 48 threads are all in this exact same code, I suspect that there is an infinite loop involved. You need to closely examine the above code (it would have helped you greatly if that code was compiled with -g so that the line numbers would show up in the stack trace).

                     

                    One thing you can do it to take several thread dumps, each a few seconds appart. Look for threads that are "stuck" in the same code - that is where you have infinite loops.

                    • 7. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.
                      smarlow
                      Sounds like  the HashMap (accessed by MenuModel.buildInternalList) is corrupted.
                      • 8. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.
                        peterj
                        Scott makes a really good point. A HashMap is not synchronized, and thus cannot handle multiple threads updating the map all at the same time. By any chance are all of the threads accessing the same HashMap, or are they accessing individual HashMaps? If the same, then you do have a problem
                        • 9. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.

                          Thanks very much, It might have helped lot of people like me who are trying to learn seeing thread dumps.

                           

                          I doubt that Hashmap is corrupted. the reason is, we are using struts2, which handles every request in different thread, hence different thread is using different hashmap, Also the result screen does not have any issue althoguh being accessed simulatneously by 400 to 500 users.

                           

                          Still I can not rule out the issue pointed out by you and will dig into more.

                           

                          Thanks a lot for you people again.

                          • 10. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.
                            smarlow
                            I doubt that Hashmap is corrupted. the reason is, we are using struts2, which handles every request in different thread, hence different thread is using different hashmap, Also the result screen does not have any issue althoguh being accessed simulatneously by 400 to 500 users.


                            Looking at the thread dump, that is still the mostly likely cause (whether the hashmap is shared between threads or not).

                             

                            If you have a support contract with Red Hat, give us a call.  If not, get one soon (help Red Hat to better help you .

                            • 11. Re: CPU usage increases daily for JBoss PID, reaches to 100% after 10 days.
                              peterj

                              {quote}I doubt that Hashmap is corrupted. the reason is, we are using struts2,{quote}


                              I would be very leery of this statement. Last summer I worked with a customer who was running into stack overflow issues. Turns out they were using a well know open source library (I'll call it lib1) which in turn used another open source library (I'll call it lib2) incorrectly. It turns out that lib2 is not thread safe, yet lib1 was spanning multiple threads, each one using lib2. Issues like this rarely come out during development testing because that is usually single threaded. Not until load testing, or worse in production, when multiple threads are being run do such problems show up.