1 2 3 4 5 Previous Next 72 Replies Latest reply on Aug 15, 2008 11:28 AM by prabhat.jha Go to original post
      • 60. Re: portal clustering problem when using optimistic locking
        brian.stansberry

        From perf11.log, a lot of threads waiting on what I assume is the DB connection pool:

        "ajp-10.18.1.129-8009-523" daemon prio=1 tid=0x0000002b6d668680 nid=0x5b1b in Object.wait() [0x000000006d724000..0x000000006d725db0]
        [JBoss] at java.lang.Object.wait(Native Method)
        [JBoss] - waiting on <0x0000002b37969c98> (a EDU.oswego.cs.dl.util.concurrent.QueuedSemaphore$WaitQueue$WaitNode)
        [JBoss] at EDU.oswego.cs.dl.util.concurrent.QueuedSemaphore$WaitQueue$WaitNode.doTimedWait(QueuedSemaphore.java:123)
        [JBoss] - locked <0x0000002b37969c98> (a EDU.oswego.cs.dl.util.concurrent.QueuedSemaphore$WaitQueue$WaitNode)
        [JBoss] at EDU.oswego.cs.dl.util.concurrent.QueuedSemaphore.attempt(QueuedSemaphore.java:47)
        [JBoss] at org.jboss.resource.connectionmanager.InternalManagedConnectionPool.getConnection(InternalManagedConnectionPool.java:181)
        [JBoss] at org.jboss.resource.connectionmanager.JBossManagedConnectionPool$BasePool.getConnection(JBossManagedConnectionPool.java:538)
        [JBoss] at org.jboss.resource.connectionmanager.BaseConnectionManager2.getManagedConnection(BaseConnectionManager2.java:347)
        [JBoss] at org.jboss.resource.connectionmanager.TxConnectionManager.getManagedConnection(TxConnectionManager.java:330)
        [JBoss] at org.jboss.resource.connectionmanager.BaseConnectionManager2.allocateConnection(BaseConnectionManager2.java:402)
        [JBoss] at org.jboss.resource.connectionmanager.BaseConnectionManager2$ConnectionManagerProxy.allocateConnection(BaseConnectionManager2.java:849)
        [JBoss] at org.jboss.resource.adapter.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:90)
        [JBoss] at org.hibernate.connection.DatasourceConnectionProvider.getConnection(DatasourceConnectionProvider.java:69)
        [JBoss] at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:423)
        [JBoss] at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:144)
        [JBoss] at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:139)
        [JBoss] at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1547)
        [JBoss] at org.hibernate.loader.Loader.doQuery(Loader.java:673)
        [JBoss] at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
        [JBoss] at org.hibernate.loader.Loader.loadEntity(Loader.java:1907)
        [JBoss] at org.hibernate.loader.entity.CollectionElementLoader.loadElement(CollectionElementLoader.java:72)
        [JBoss] at org.hibernate.persister.collection.OneToManyPersister.getElementByIndex(OneToManyPersister.java:360)
        [JBoss] at org.hibernate.collection.AbstractPersistentCollection.readElementByIndex(AbstractPersistentCollection.java:158)
        [JBoss] at org.hibernate.collection.PersistentMap.get(PersistentMap.java:146)
        [JBoss] at org.jboss.portal.core.impl.model.portal.PortalObjectImpl.getChild(PortalObjectImpl.java:427)
        [JBoss] at org.jboss.portal.core.model.portal.command.mapping.DefaultPortalObjectPathMapper$1.getChild(DefaultPortalObjectPathMapper.java:91)
        [JBoss] at org.jboss.portal.server.servlet.PathParser.map(PathParser.java:86)
        [JBoss] at org.jboss.portal.core.model.portal.command.mapping.DefaultPortalObjectPathMapper.getTarget(DefaultPortalObjectPathMapper.java:109)
        [JBoss] at org.jboss.portal.core.model.portal.PortalObjectCommandFactory.doMapping(PortalObjectCommandFactory.java:77)
        [JBoss] at org.jboss.portal.core.controller.command.mapper.CommandFactoryDelegate.doMapping(CommandFactoryDelegate.java:87)
        [JBoss] at org.jboss.portal.core.controller.command.mapper.DelegatingCommandFactoryService.doMapping(DelegatingCommandFactoryService.java:142)
        [JBoss] at org.jboss.portal.core.model.portal.DefaultPortalCommandFactory.doMapping(DefaultPortalCommandFactory.java:69)
        [JBoss] at org.jboss.portal.core.controller.Controller.handle(Controller.java:208)
        [JBoss] at org.jboss.portal.server.RequestControllerDispatcher.invoke(RequestControllerDispatcher.java:51)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:131)
        [JBoss] at org.jboss.portal.core.cms.aspect.IdentityBindingInterceptor.invoke(IdentityBindingInterceptor.java:47)
        [JBoss] at org.jboss.portal.server.ServerInterceptor.invoke(ServerInterceptor.java:38)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:115)
        [JBoss] at org.jboss.portal.server.aspects.server.ContentTypeInterceptor.invoke(ContentTypeInterceptor.java:68)
        [JBoss] at org.jboss.portal.server.ServerInterceptor.invoke(ServerInterceptor.java:38)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:115)
        [JBoss] at org.jboss.portal.core.aspects.server.PortalContextPathInterceptor.invoke(PortalContextPathInterceptor.java:45)
        [JBoss] at org.jboss.portal.server.ServerInterceptor.invoke(ServerInterceptor.java:38)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:115)
        [JBoss] at org.jboss.portal.core.aspects.server.LocaleInterceptor.invoke(LocaleInterceptor.java:96)
        [JBoss] at org.jboss.portal.server.ServerInterceptor.invoke(ServerInterceptor.java:38)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:115)
        [JBoss] at org.jboss.portal.core.aspects.server.UserInterceptor.invoke(UserInterceptor.java:246)
        [JBoss] at org.jboss.portal.server.ServerInterceptor.invoke(ServerInterceptor.java:38)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:115)
        [JBoss] at org.jboss.portal.server.aspects.server.SignOutInterceptor.invoke(SignOutInterceptor.java:98)
        [JBoss] at org.jboss.portal.server.ServerInterceptor.invoke(ServerInterceptor.java:38)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:115)
        [JBoss] at org.jboss.portal.core.impl.api.user.UserEventBridgeTriggerInterceptor.invoke(UserEventBridgeTriggerInterceptor.java:65)
        [JBoss] at org.jboss.portal.server.ServerInterceptor.invoke(ServerInterceptor.java:38)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:115)
        [JBoss] at org.jboss.portal.core.aspects.server.TransactionInterceptor.org$jboss$portal$core$aspects$server$TransactionInterceptor$invoke$aop(TransactionInterceptor.java:49)
        [JBoss] at org.jboss.portal.core.aspects.server.TransactionInterceptor$invoke_N5143606530999904530.invokeNext(TransactionInterceptor$invoke_N5143606530999904530.java)
        [JBoss] at org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:79)
        [JBoss] at org.jboss.aspects.tx.TxInterceptor$RequiresNew.invoke(TxInterceptor.java:262)
        [JBoss] at org.jboss.portal.core.aspects.server.TransactionInterceptor$invoke_N5143606530999904530.invokeNext(TransactionInterceptor$invoke_N5143606530999904530.java)
        [JBoss] at org.jboss.portal.core.aspects.server.TransactionInterceptor.invoke(TransactionInterceptor.java)
        [JBoss] at org.jboss.portal.server.ServerInterceptor.invoke(ServerInterceptor.java:38)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:115)
        [JBoss] at org.jboss.portal.server.aspects.LockInterceptor$InternalLock.invoke(LockInterceptor.java:69)
        [JBoss] at org.jboss.portal.server.aspects.LockInterceptor.invoke(LockInterceptor.java:130)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invokeNext(Invocation.java:115)
        [JBoss] at org.jboss.portal.common.invocation.Invocation.invoke(Invocation.java:157)
        [JBoss] at org.jboss.portal.server.servlet.PortalServlet.service(PortalServlet.java:250)


        Should whatever data that stack is getting from the DB be coming from the cache?

        On perf13 I just noticed that there a ton of "Timer-123" threads.

        • 61. Re: portal clustering problem when using optimistic locking
          prabhat.jha

          I spent this weekend watching Olympics and running tests with new hibernate/jbc jar. Here are the results:

          With new jar, results are as good as previous jar upto 3-node which is almost linear taking 1400 users on one node as baseline. In 4-node cluster, results are better and it can handle 5300 users which used to be only 4200. With 5-nodes, result is only a bit better, it handles 3100 which used to be 2800.

          Just to make sure there was nothing wrong with 5th server, I used another server and results look similar.

          I also verified the new jar with Opt. Lock. because this whole issue started when OL was used (please see the first page of this thread). I don't see any problem with OL and REPL_SYNC combination. So, I think this new jar is better.

          I will take thread dumps with 5 nodes with new jar as Brian had asked for. I will also look into why org.jboss.portal.core.impl.model.portal.PortalObjectImpl.getChild is not using cache.

          • 62. Re: portal clustering problem when using optimistic locking
            galder.zamarreno

            Prabhat, I've been away for a little bit. I'll check the thread dumps you first uploaded. One thing about then though. Can you remove [JBoss] from the start of each line of the dump? Otherwise, the dumps have to be grep/cut before they can be parsed by tools like TDA.

            Glad to hear that the new cache provider is yielding better results :)

            • 63. Re: portal clustering problem when using optimistic locking
              galder.zamarreno

              After fiddling a bit with cut, the following produces an output that can be parsed with TDA:

              cut -d']' -f2- perf11.log | cut -d' ' -f2- > perf11-cut.log


              • 64. Re: portal clustering problem when using optimistic locking
                galder.zamarreno

                I've looked at the thread dumps provided and they all show the similar signs:

                "This thread dump contains monitors without a locking thread information. This means, the monitor is hold by a system thread or some external resource. You should check the monitors without locking threads for more information. If you see many monitors having no locking thread, this usually means, the garbage collector is running. In this case you should consider analyzing the Garbage Collector output. "

                Effectively, this means that GC has it. Are these machines multi core? What are the startup options you're using? Did you enable GC logging? Prabhat, I thought we went through similar situation when you're testing 1 node scenarios, didn't we? I think I suggested that you use tools like TDA to discover scenarios like this.

                • 65. Re: portal clustering problem when using optimistic locking
                  prabhat.jha

                   

                  "galder.zamarreno@jboss.com" wrote:

                  Effectively, this means that GC has it. Are these machines multi core? What are the startup options you're using? Did you enable GC logging? Prabhat, I thought we went through similar situation when you're testing 1 node scenarios, didn't we? I think I suggested that you use tools like TDA to discover scenarios like this.


                  Yes, we went through that exercise and because of that we are at this stage. Portal was tuned enough to get the desired performance in one node and that is taken as baseline for scalability testing. As you can see there is no scalability problem with 3 nodes and now with 4 nodes. So, I do not know why GC would be a factor in 5-node cluster when it's not in {1,2,3,4} cluster. Just an FYI: all these machines are of same architecture (multicore 64 bit) and are in the same subnet. Please ping me as needed with server setup information.

                  I did see few threads waiting on DB connection which Brian also pointed out. This was intentionally left out because caching whatever data that threads are trying to get would result into a huge memory footpring in some use cases. But what I am going to do is turn on caching for those data and see what that results into.

                  • 66. Re: portal clustering problem when using optimistic locking
                    brian.stansberry

                    I also recommend you test your 5 node setup with any clustering capability turned off. And look at your DB connection pools -- is the number of connections the DB is configured to allow sufficient to meet the demands of 5 nodes?

                    • 67. Re: portal clustering problem when using optimistic locking
                      prabhat.jha

                      Brian, yes, I am already running tests with 5 portal servers in non clustered configuration to verify what you pointed out. I also do not think clustering is the main culprit here and I really hope it's not.

                      I am running with default configuration but if I set cache_mode to LOCAL in cluster modificaiton, it should serve the same purpose right?

                      • 68. Re: portal clustering problem when using optimistic locking
                        brian.stansberry

                        Preferably do both. I don't want to try to predict what's going to show what, so start with as few moving parts as possible (i.e. default configuration) and see what happens. A cache with CacheMode.LOCAL adds a moving part.

                        • 69. Re: portal clustering problem when using optimistic locking
                          prabhat.jha

                          Here are few findings:


                          I did see few threads waiting on DB connection which Brian also pointed out. This was intentionally left out because caching whatever data that threads are trying to get would result into a huge memory footprint in some use cases. But what I am going to do is turn on caching for those data and see what that results into.


                          Changing the cache configuration for above results into good scalability. Earlier hibernate config was set to use lazy="extra" and apparently hibernate bypasses second level cache for this lazy loading configuration. This setting is recommended for large collection and hence was made default in portal. Changing it to lazy="true" forces hibernate to use second level cache which helps in scalability.

                          #Node #Users
                          1 1400
                          2 2200
                          3 3300
                          4 4100
                          5 5500
                          

                          So although ability to handle more load decreases for (2,3} node cluster, it helps with 4 and 5 nodes. I think this suggests that increasing heap size for each node would result into each node handling more load while keeping the scalability.

                          I have posted in Hibernate ML asking if there is a way to use cache with lazy="extra" because I think that will be optimal in this use case. Of course I wont know until I run tests.

                          Since this new jar fixes problem with OL and REPL_SYNC combination, I am going to run tests with this configuration as well.

                          • 70. Re: portal clustering problem when using optimistic locking
                            brian.stansberry

                             

                            This was intentionally left out because caching whatever data that threads are trying to get would result into a huge memory footprint in some use cases.


                            This isn't necessarily an either-or choice, since you can set up eviction in JBC to limit the amount of data cached.

                            • 71. Re: portal clustering problem when using optimistic locking
                              galder.zamarreno

                              Prabhat, did you also apply the -XX:+UseConcMarkSweepGC -XX:+UseParNewGC options to the tests you've just run?

                              • 72. Re: portal clustering problem when using optimistic locking
                                prabhat.jha

                                Galder, I ran one test configuration (OL, INV_SYNC, lazy=true) in hibernate with the GC options you mentioned and results are not as good when not using these options. I am running other configurations and see how it looks like. I will update you all. Just to make sure, this is what server startup looks like:

                                java -server -Xms3072m -Xmx3072m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseLargePages -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djgroups.udp.ip_ttl=1 -jar /tmp/pjha/eap43-portal26/bin/run.jar -c production -b 10.18.1.131 -Djboss.partition.name=PortalScalabilityTestPartition -Dcluster.route.name=perf13 -u 239.11.12.13 -Dsf.pid=FR5VbgAgyO09Z8vq
                                


                                1 2 3 4 5 Previous Next