1 2 Previous Next 17 Replies Latest reply on Oct 7, 2015 1:50 PM by aogburn Branched to a new discussion.

    Clustering + Session sharing lock acquisition errors

    splaer

      I seem to be having some issues with clustering jboss 6 and being able to share sessions consistently.

      Now im very new to jboss and clustering so I may be missing something simple.

       

      My Setup:

      Server 1=

      Apache 2.2.15

      Mod_Cluster 1.1

       

      Server 2=

      Two Jboss AS 6 nodes setup for clustering

      jdk1.6u23

       

      Sticky Sessions are disabled

      App is marked as Distributable in web.xml

       

       

      cluster startup:

      [STDOUT]

      10:51:25,828 INFO  [STDOUT] -------------------------------------------------------------------

      10:51:25,828 INFO  [STDOUT] GMS: address=10.100.20.102:1099, cluster=DefaultPartition-HAPartition, physical address=10.100.20.102:55200

      10:51:25,829 INFO  [STDOUT] -------------------------------------------------------------------

      10:51:26,049 INFO  [JGroupsTransport] Received new cluster view: [10.100.20.103:1099|3] [10.100.20.103:1099, 10.100.20.102:1099]

      10:51:26,065 INFO  [JGroupsTransport] Cache local address is 10.100.20.102:1099, physical addresses are [10.100.20.102:55200]

      10:51:26,066 INFO  [GlobalComponentRegistry] Infinispan version: Infinispan 'Ursus' 4.2.0.FINAL

      10:51:26,106 INFO  [ComponentsJmxRegistration] Could not register object with name: org.infinispan:type=Cache,name="distributed-state(repl_sync)",manager="ha-partition",component=Cache

      10:51:26,107 INFO  [CacheJmxRegistration] MBeans were successfully registered to the platform mbean server.

      10:51:26,107 INFO  [RpcManagerImpl] Trying to fetch state from 10.100.20.103:1099

      10:51:26,296 INFO  [RpcManagerImpl] Successfully retrieved and applied state from 10.100.20.103:1099

      10:51:26,300 INFO  [ComponentRegistry] Infinispan version: Infinispan 'Ursus' 4.2.0.FINAL

      10:51:26,308 INFO  [DefaultCacheContainerFactory] Started "distributed-state" cache from "ha-partition" container

      10:51:26,342 INFO  [DefaultPartition] Number of cluster members: 2

      10:51:26,344 INFO  [DefaultPartition] Fetching initial service state (will wait for 30000 milliseconds for each service):

      10:51:26,544 INFO  [HANamingService] Started HAJNDI bootstrap; jnpPort=1100, backlog=50, bindAddress=/10.100.20.102

      10:51:26,557 INFO  [DetachedHANamingService$AutomaticDiscovery] Listening on /10.100.20.102:1102, group=230.0.0.4, HA-JNDI address=10.100.20.102:1100

      10:51:26,571 INFO  [TransactionManagerFactory] Using a batchMode transaction manager

      10:51:26,595 INFO  [ComponentsJmxRegistration] Could not register object with name: org.infinispan:type=Cache,name="distributed-tree(repl_sync)",manager="ha-partition",component=Cache

      10:51:26,595 INFO  [CacheJmxRegistration] MBeans were successfully registered to the platform mbean server.

      10:51:26,595 INFO  [RpcManagerImpl] Trying to fetch state from 10.100.20.103:1099

      10:51:26,610 INFO  [RpcManagerImpl] Successfully retrieved and applied state from 10.100.20.103:1099

      10:51:26,616 INFO  [ComponentRegistry] Infinispan version: Infinispan 'Ursus' 4.2.0.FINAL

      10:51:26,616 INFO  [DefaultCacheContainerFactory] Started "distributed-tree" cache from "ha-partition" container

       

       

       

      The server starts up fine and the cluster gets created and mod_cluster connects fine and their able to share session information between them. They alternate by request ajax calls and share the logon session just fine. After awhile of moving around in the app though i begin to get session lock errors only on one node. see below:

       

       

       

      2011-02-02 10:48:00,772 ERROR [org.apache.catalina.connector.CoyoteAdapter] (ajp-10.100.20.102-8009-13) An exception or error occurred in the container during the request processing: java.l

      ang.RuntimeException: Caught TimeoutException acquiring ownership of XOEYDujTwfqNjbduMUMMMQ__

              at org.jboss.web.tomcat.service.session.ClusteredSession.acquireSessionOwnership(ClusteredSession.java:603) [:6.0.0.Final]

              at org.jboss.web.tomcat.service.session.ClusteredSession.access(ClusteredSession.java:566) [:6.0.0.Final]

              at org.apache.catalina.connector.Request.doGetSession(Request.java:2565) [:6.0.0.Final]

              at org.apache.catalina.connector.Request.getSession(Request.java:2315) [:6.0.0.Final]

              at org.jboss.web.tomcat.service.session.JvmRouteValve.checkJvmRoute(JvmRouteValve.java:95) [:6.0.0.Final]

              at org.jboss.web.tomcat.service.session.JvmRouteValve.invoke(JvmRouteValve.java:85) [:6.0.0.Final]

              at org.jboss.web.tomcat.service.session.LockingValve.invoke(LockingValve.java:62) [:6.0.0.Final]

              at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285) [:1.1.0.Final]

              at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261) [:1.1.0.Final]

              at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88) [:6.0.0.Final]

              at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100) [:6.0.0.Final]

              at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) [:6.0.0.Final]

              at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) [:6.0.0.Final]

              at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158) [:6.0.0.Final]

              at org.jboss.web.tomcat.service.sso.ClusteredSingleSignOn.invoke(ClusteredSingleSignOn.java:696) [:6.0.0.Final]

              at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) [:6.0.0.Final]

              at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53) [:6.0.0.Final]

              at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362) [:6.0.0.Final]

              at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:504) [:6.0.0.Final]

              at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:437) [:6.0.0.Final]

              at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:951) [:6.0.0.Final]

              at java.lang.Thread.run(Unknown Source) [:1.6.0_20]

      Caused by: org.jboss.ha.framework.server.lock.TimeoutException: Cannot acquire lock //localhost/test/XOEYDujTwfqNjbduMUMMMQ__ from cluster

              at org.jboss.ha.framework.server.lock.SharedLocalYieldingClusterLockManager.lock(SharedLocalYieldingClusterLockManager.java:554) [:2.0.0.Final]

              at org.jboss.web.tomcat.service.session.distributedcache.ispn.DistributedCacheManager.acquireSessionOwnership(DistributedCacheManager.java:448) [:1.0.0.Final]

              at org.jboss.web.tomcat.service.session.ClusteredSession.acquireSessionOwnership(ClusteredSession.java:595) [:6.0.0.Final]

              ... 21 more

       

       

       

      above error causes the entire app to freeze until the node is brought down that is having the session issues....

      any help would be greatly appreciated. any logs or configuration files can be provided if needed.

       

      Thanks in advance

        • 1. Clustering + Session sharing lock acquisition errors
          pferraro

          Is there a particular reason for disabling sticky sessions?  Sticky sessions cannot be safely disabled without switching to synchronous web session replication.  Otherwise, a subsequent request cannot be certain that the session is up to date as it may not have yet been replicated.

          My recommendation to you - enable sticky sessions.  There's no reason to bounce requests for the same session to multiple servers - especially with an ajax use case.

          • 2. Clustering + Session sharing lock acquisition errors
            splaer

            We have a very very extensive app and the only way that I have seen that it will benefit from the high availability and load balancing was to allow  sessions themselves to be load balanced and have full session replication. Ajax was just an example that I knew of. Theres a lot more that goes on that I am not familiar with. The only viable option for us would be to have fully synchronous session replication between all nodes in a cluster and I did confirm that with all the programmers that all nodes need to have access to the same exact state at any time. I will double check on the sticky sessions thought to make sure its not an option. identical sessions between all nodes is a requirement though.

             

            I thought I had enabled synchronous states. what config files need to be edited to enable that? ( I thought it was the inifinispan-config.xml may have edited it wrong as well though..)

             

            thanks for your response!!

            • 3. Clustering + Session sharing lock acquisition errors
              pferraro

              There are a number of ways to change the replication mode:

               

              1. To switch the default (for all web applications) to synchronous replication:

              Modify the default cache configuration of the "web" cache container.  The default config uses <async/>.

              Changing this is a slightly complicated because of a configuration workaround in place for an Infinispan bug (ISPN-835).  Essentially you would replace the <clustering /> and <loaders /> blocks in the <default /> config with those from the <namedCache name="sync"/> config.

               

              2. To switch to synchronous replication for a single web application:

              Within your web application's META-INF/jboss-web.xml file, add the following:

               

              <jboss-web>

                 <!-- ... -->

                 <replication-config>

                    <cache-name>web/sync</cache-name>

                 </replication-config>

                 <!-- ... -->

              </jboss-web>

               

              This tells the distributed session manager to use the "sync" cache from the "web" container.

               

              I don't completely understand why you needed to disable sticky sessions to benefit from high-availability.  Doing this will result in poor performance, since a given request will need to wait for its session to completely replicate before returning a response to the client, since subsequent requests for the same session may not hit the same node.

              • 4. Clustering + Session sharing lock acquisition errors
                splaer

                ok ill try to see if i can get the sync setup to have the desired effect we want. Maybe I dont fully understand how the session replication works if sticky session is enabled. Lets say we have a 2 node setup. A couple hundred users logged into both nodes. If node 1 crashes would there sessions still be available via the other node or would they lose current state and have to reauthenticate?

                • 5. Clustering + Session sharing lock acquisition errors
                  splaer

                  ok so i took your advice and enabled sticky sessions and changed my proxypass to JSESSIONID and nofailover=on and like before everything worked fine for alittle while except this time an entire session was staying on one node instead of being lode balanced between the two. Everything looked like it was working great but after some more testing took place we ended up getting the same lock errors again which is very odd to me if every request is suppose to go to one node only now.. caused the app to freeze until that session timed out completely and than resumed normal operations but is continually getting lock errors.

                   

                  13:33:14,543 ERROR [org.apache.catalina.connector.CoyoteAdapter] An exception or error occurred in the container during the request processing: java.lang.RuntimeException: Caught TimeoutExc

                  eption acquiring ownership of +9pb-XSok+6tzi9e3Hk-5A__

                          at org.jboss.web.tomcat.service.session.ClusteredSession.acquireSessionOwnership(ClusteredSession.java:603) [:6.0.0.Final]

                          at org.jboss.web.tomcat.service.session.ClusteredSession.access(ClusteredSession.java:566) [:6.0.0.Final]

                          at org.apache.catalina.connector.Request.doGetSession(Request.java:2565) [:6.0.0.Final]

                          at org.apache.catalina.connector.Request.getSession(Request.java:2315) [:6.0.0.Final]

                          at org.jboss.web.tomcat.service.session.JvmRouteValve.checkJvmRoute(JvmRouteValve.java:95) [:6.0.0.Final]

                          at org.jboss.web.tomcat.service.session.JvmRouteValve.invoke(JvmRouteValve.java:85) [:6.0.0.Final]

                          at org.jboss.web.tomcat.service.session.LockingValve.invoke(LockingValve.java:62) [:6.0.0.Final]

                          at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285) [:1.1.0.Final]

                          at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261) [:1.1.0.Final]

                          at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88) [:6.0.0.Final]

                          at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100) [:6.0.0.Final]

                          at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) [:6.0.0.Final]

                          at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) [:6.0.0.Final]

                          at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158) [:6.0.0.Final]

                          at org.jboss.web.tomcat.service.sso.ClusteredSingleSignOn.invoke(ClusteredSingleSignOn.java:696) [:6.0.0.Final]

                          at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) [:6.0.0.Final]

                          at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53) [:6.0.0.Final]

                          at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362) [:6.0.0.Final]

                          at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:504) [:6.0.0.Final]

                          at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:437) [:6.0.0.Final]

                          at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:951) [:6.0.0.Final]

                          at java.lang.Thread.run(Unknown Source) [:1.6.0_20]

                  Caused by: org.jboss.ha.framework.server.lock.TimeoutException: Cannot acquire lock //localhost/SchedWorx/+9pb-XSok+6tzi9e3Hk-5A__ from cluster

                          at org.jboss.ha.framework.server.lock.SharedLocalYieldingClusterLockManager.lock(SharedLocalYieldingClusterLockManager.java:554) [:2.0.0.Final]

                          at org.jboss.web.tomcat.service.session.distributedcache.ispn.DistributedCacheManager.acquireSessionOwnership(DistributedCacheManager.java:448) [:1.0.0.Final]

                          at org.jboss.web.tomcat.service.session.ClusteredSession.acquireSessionOwnership(ClusteredSession.java:595) [:6.0.0.Final]

                          ... 21 more

                  • 6. Re: Clustering + Session sharing lock acquisition errors
                    pferraro

                    Let me try to clarify:

                    Session replication operates independently of the sticky session setting - however one impacts the other.

                    Sessions can replicate either synchronously (i.e. make sure session replicates before returning response) or asynchronously (i.e. replicate session in a separate thread and return a response).

                    If you have sticky session disabled, then it is possible that a subsequent request for a given session arrives on a different node than the previous request - but before the session replicated to that node.  This is the scenario we want to avoid.

                    Complicating this - before a node tries to use a session, it first tries to obtain ownership of it (via locking).  If sticky sessions are disabled, then everytime a request for a given session bounces between nodes, it will require an RPC to obtain session ownership.  This can be expensive since session ownership is acquired before the request is processed.

                     

                    To answer your specific question: If sticky sessions are enabled and node1 crashes, sessions will still be available on the other node (they are replicated after all) - with the possible exception of sessions that had not yet replicated when node 1 crashed, if asynchronous replication was used.  Think of async vs sync as a trade-off between performance and high-availability.  Sync provides full HA, but requires a per-request performance cost.  Async provides almost full HA, but with low per-request performance cost.  Synchronous replication can technically be used with or without session stickiness.  Asynchronous replication needs session stickiness to work correctly.

                    • 7. Clustering + Session sharing lock acquisition errors
                      splaer

                      ok Thank you so much for that answer..... turned mud into almost perfectly clear water. made the entire setup make much more sense now. The app seems to be fully functioning currently with stickysessions enabled and stickysessionsremove set to true. Sent it over to our testing team who are currently putting it through its paces to make sure there arent anymore side effects of the clustered setup. thanks again for your help and i hope we no longer have locking issues with this current setup.

                      • 8. Re: Clustering + Session sharing lock acquisition errors
                        pferraro

                        I don't fully understand... What value are you using for ProxyPass?

                         

                        The timeout exception implies that concurrent requests for the "+9pb-XSok+6tzi9e3Hk-5A__" session are getting routed to multiple nodes.  Multiple threads within a node can share a lock on a session, but in order for a remote node to take ownership (i.e. acquire the lock), then no threads on any other node can have the session locked.  An incorrect ProxyPass value might cause this.

                        • 9. Clustering + Session sharing lock acquisition errors
                          splaer

                          ok my proxy pass setup is here

                           

                          ProxyPass / balancer://test/SchedWorx/ stickysession=JSESSIONID nofailover=On

                           

                          let me know if you need anymore config files or anything else to help see whats going on..

                          • 10. Clustering + Session sharing lock acquisition errors
                            splaer

                            my mod cluster config...

                                <!-- Use load balancing groups to group nodes into fail-over groups -->

                                <!-- Requests stuck to a node that is no longer available with fail over to a node within the same load balancing group, if possible -->

                                <property name="loadBalancingGroup">test5</property>

                                  <!-- Should we use an HA singleton per load balancing group? -->

                                <!--property name="masterPerLoadBalancingGroup"></property-->

                                  <!-- Configuration values for the load balancer itself (must be the

                                     same on all nodes in the cluster). These will be passed to the

                                     load balancer. -->

                                <property name="stickySession">true</property>

                                <property name="stickySessionForce">false</property>

                                <property name="stickySessionRemove">true</property>

                                <property name="maxAttempts">1</property>

                                <property name="workerTimeout">-1</property>

                              </bean>

                             

                             

                            couple of examples of a request from apache to jboss:

                             

                            2011-02-01 09:40:45,634 INFO  [org.apache.catalina.core.ContainerBase.[jboss.web].[localhost]] (ajp-10.100.20.102-8009-6)             header=cookie=JSESSIONID=YqrLJuQK7uJbtJVxx3kfug__.node1

                             

                            2011-02-01 09:40:45,637 INFO  [org.apache.catalina.core.ContainerBase.[jboss.web].[localhost]] (ajp-10.100.20.102-8009-3) requestedSessionId=YqrLJuQK7uJbtJVxx3kfug__.node1

                             

                            2011-02-01 09:40:45,637 INFO  [org.apache.catalina.core.ContainerBase.[jboss.web].[localhost]] (ajp-10.100.20.102-8009-6) requestedSessionId=YqrLJuQK7uJbtJVxx3kfug__.node1

                            • 11. Clustering + Session sharing lock acquisition errors
                              pferraro

                              Can you validate (via the logs) whether or not a given session is getting routed to both nodes using your current configuration?

                              • 12. Clustering + Session sharing lock acquisition errors
                                splaer

                                Will enable the logging on both servers and run some tests over weekend and see if I can confirm stickiness

                                • 13. Clustering + Session sharing lock acquisition errors
                                  splaer

                                  So far so good after i set sessionremove=true.... ran through our testing team and everything went through flawlessly this time. Going to deploy a cluster with a single pilot customer to better test load of hundreds of users.

                                  • 14. Clustering + Session sharing lock acquisition errors
                                    csanda

                                    Hi,

                                     

                                    I am having the same issue when clustering JBoss 6 with Apache without sticky session.  The issue can be easily reproduced by creating a <distributable/> WAR with synchronous session replication and one JSP which loads some CSS files. After hitting the refresh button I get the exception in about 10 seconds.  Another problem is that the server which crashes will never recover and serve requests for that session until restarted.

                                     

                                    Is there a workaround for this bug, because stick session is not an option for us right now?

                                     

                                    Regards,

                                    Catalin

                                    1 2 Previous Next