10 Replies Latest reply on May 28, 2014 7:02 PM by jimryan

    Slow JVM memory leak

      Greetings,

       

      I am running JBoss AS 5.1/Sun JRE 1.6_17/ on CentOS 5.   I have a 2-node cluster using stateless EJBs accessied via both RMI and jboss-ws, three back-end SQL Server 2005 hosts accessed via jTDS.    No JNI code.

       

      My JVM footprint slowly grows over a 24 hour period until it reaches 3G and of course terminates with an out of memory error.   The strange thing is that the Java heap stays very reasonable (~500M) and stabilizes.    Active thread count never goes over ~300.    The JVM footprint itself just continues to grow until it exhausts the 32-bit limit of 3 gigs.   Please note that I am NOT getting heap allocation errors as I am running with initial/max of 1024M and the Java heap stays well below that.

       

      This seemed to begin with my upgrade from 4.2.3 to 5.10.   That is also when I went from standalone to a clustered configuration.

       

      I am stumped.   I have spent two weeks trying to track this down and am getting no further.   Can anyone offer some troubleshooting tips?   All the Java memory analysis tools (Eclipse MAT, jmap, jconsole) don't really help as they all report a small, stable java heap size.   There is some resource in the JVM itself that is leaking.   Running pmap against the JVM pid also does not tell me much, other than that there is a large amount of [anon] memory allocated, mostly in smaller (300K) chunks.

       

      Thank you,

       

      Jon

        • 1. Re: Slow JVM memory leak
          xmedeko

          If you suspect Sun JRE 1.6_17, try another JRE (e.g. Sun JRE 1.5, Sun JRE 1.6_04,IBM JRE, JRockit, ...).

           

          Or try Valgrind http://community.jboss.org/wiki/MemoryLeaksCheckwithValgrind

          • 2. Re: Slow JVM memory leak

            I have tried various Sun JVM versions, including u18, with the same results.   One interesting thing to note is that this is related to the JBossWS stack.   When I turn off all web service access to my SLSBs and hit the cluster only via RMI, the leak goes away.    If anyone has any ideas as to what may cause a native (non Java heap/non java Permgen) leak in this scenario, please pass it on, even if it is a brainstorm.

             

            I will try to create a simple dummy application and web service client which reproduces the problem.   Unfortunatly I was unable to get the JVM running under valgrind, even following this FAQ: http://valgrind.org/docs/manual/faq.html#faq.java

            • 3. Re: Slow JVM memory leak
              peterj

              VisualVM, which comes with JDK 6, has some decent tools for tracking down memory leaks. It might help you pinpoint where the leak i, and if it really is related to web services you could submit a JIRA.

               

              Oh, the other thing you should do is check the latest JBossWS releases to see if any of them fixes a memory leak.

              • 4. Re: Slow JVM memory leak
                dmlloyd
                I'm not 100% sure offhand what is involved in the code path for JBossWS, but maybe check and see if direct buffer space is being exhausted?
                • 5. Re: Slow JVM memory leak

                  I have finally found a workaround, using the standard OpenJDK that comes with Centos 5: java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5.     Using this JDK fixes the leak completely.   VIRT size stabilizes at 1833M, which is exactly what I would expect.   So I have no idea if this version of OpenJDK contains a fix, or the newer commercial JDKs from Sun contain a regression - as I still see the problem with JDK 6 update 19.

                   

                  The crazy part is that no one else seems to have had such a problem.  My application is rather busy - up to 50 transactions/second, but it still confounds that no one else seems to see this native leak.  Anyway, thanks for your replies.   I would still like to figure out what is going on, but for now, I will just be happy I have found a way to stabilize my cluster.

                  • 6. Re: Slow JVM memory leak
                    samuel.cai

                    Jon, you may also try JDK7. I guess your problem is due to a JDK bug: http://bugs.sun.com/view_bug.do?bug_id=6735255

                    • 7. Re: Slow JVM memory leak
                      keteracel

                      Hey Jon,

                       

                      I'm wondering if you ever got to the bottom of this. I'm seeing the same thing in 1.6.0_26 (I know, that's pretty old) and only with an application that receive lots of GETs on our Netty server. I checked lsof on a long running vs new application and it doesn't look like we're leaking file handles (connections). So feels JVM level.

                       

                      Did you upgrade to JDK7 or use OpenJDK to fix this?

                       

                      Thanks,

                       

                      Paul.

                      • 8. Re: Slow JVM memory leak
                        rhusar

                        I would analyze the heap first, if that is fruitless look into JDK 1.6 latest update and if that still doesn't help move to JDK 1.7.

                        • 9. Re: Slow JVM memory leak
                          tjclifford01

                          I recently found this issue at:

                             https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Application_Platform/5/html-single/Messaging_User_Guide/index.html

                          approximately 1/3 down the page, states that: 

                           

                             Do not Use Hypersonic in Production  

                            Although Hypersonic configuration is used as the default persistence configuration, Hypersonic is not suitable or supported in production due to the following known issues: 

                           

                           

                          •      no transaction isolation    
                          •      thread and socket leaks (connection.close() does not tidy up resources)    
                          • persistence quality (logs commonly become corrupted after a failure, preventing automatic recovery) 

                             

                          •      database corruption    
                          •      stability under load (database processes cease when dealing with too much data)    
                          •      not viable in clustered environments    

                          The Hypersonic database is intended for developing and testing purposes and should not be used in a production environment. For more information about recommended databases, refer to the Using Other Databases chapter in the Getting Started Guide

                           

                          So you may want to switch to a different JMS sql store, if this is a production server.

                          • 10. Re: Slow JVM memory leak
                            jimryan

                            "The crazy part is that no one else seems to have had such a problem."

                             

                            I have a desktop app (not a JBOSS server app, but a very complicated multithreaded app in industrial automation) running on Centos and I experienced something like Jon's problem: running on Centos, large memory allocation climb bogging down the app while used memory stays flat and at reasonable level. I was about to try Jon's fix of reverting to the Centos OpenJDK instead of the Oracle JVM I'm using when I noticed some objects in the heap spiking up in count fairly high before getting released and garbage collected. I'm talking about spikes of a few thousand small objects in a Hashtable. If I had taken more data perhaps I would have found large spikes of several hundred MB; I don't know. I changed the algorithm in my code so as not to have more than ten or twenty such objects at a time instead of thousands. The allocated memory issue disappeared. Is the JVM getting spooked by spikes in used memory, allocating too much memory, not decreasing the allocating after the spike disappears, and then bogging down the app? Does the Centos OpenJDK Jon used handle such spikes more skillfully?