5 Replies Latest reply on Oct 11, 2010 12:47 PM by galder.zamarreno

    JbossCache and Websphere 6.1 Cluster: jgroups deadlock?

    lorod

      Hello all,

       

      I'm using a Websphere 6.1 Cluster made up of 2 nodes, each node on a different server (node1 on server1, node2 on server2).

      I'm using JbossCache Api 3.0.3GA as caching instrument of my WebApplication.

      I have two different cache instances: the first for user layouts on file system, the second for user data in memory.

      All works fine when I start up the cluster, data sharing between nodes of the two instances is great (in UDP and also in TCP mode).

      Layout Cache is binded on port 7800 on the two servers/nodes, the user data cache is binded on 7801 port.

      That's my TCP configuration for only one of the two cache instances on the two nodes/servers of my cluster (all the files in jbc.zip file attached).

       

      Layout Cache on Server1/Node1Layout Cache on Server2/Node2
      <?xml version="1.0" encoding="utf-8" ?>
      <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                  xmlns="urn:jboss:jbosscache-core:config:3.0">

           <!-- CLUSTERING: sharing via TCPIP -->
      <clustering mode="replication" clusterName="LAYOUT">
        <stateRetrieval timeout="600000" fetchInMemoryState="true"/>
        <async/>
        <jgroupsConfig>
         <TCP start_port="7800" bind_addr="172.16.15.170" loopback="false"/>
         <TCPPING timeout="3000"
         initial_hosts="172.16.15.17[7800],172.16.15.170[7800]"
           port_range="0" num_initial_members="2" />
         <MERGE2 max_interval="30000" min_interval="10000"/>
         <FD max_tries="5" shun="true" timeout="10000"/>
         <VERIFY_SUSPECT timeout="1500"/>
         <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"
          retransmit_timeout="300,600,1200,2400,4800" discard_delivered_msgs="true" />
         <pbcast.STABLE desired_avg_gossip="50000" max_bytes="400000" stability_delay="1000"/>
         <pbcast.GMS join_timeout="5000" print_local_addr="true" shun="false"
           view_ack_collection_timeout="5000" view_bundling="true"/>
         <FC max_credits="1000000" min_threshold="0.20"/>
         <FRAG2 frag_size="60000"/>
         <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
         <!--pbcast.STREAMING_STATE_TRANSFER
          socket_buffer_size="1048576" /-->
         <pbcast.FLUSH timeout="0"/>
        </jgroupsConfig>
      </clustering>

      <!-- FS: persist on FS -->
          <loaders>
              <loader
                      async="false"
                      purgeOnStartup="false"
          fetchPersistentState="true">
                  <properties>
                      location=/opt/wax-jbosscache/#MyLayoutDB
                  </properties>
              </loader>
          </loaders>
         
      </jbosscache>
      <?xml version="1.0" encoding="utf-8" ?>
      <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                  xmlns="urn:jboss:jbosscache-core:config:3.0">

           <!-- CLUSTERING: sharing via TCPIP -->
      <clustering mode="replication" clusterName="LAYOUT">
        <stateRetrieval timeout="600000" fetchInMemoryState="true"/>
        <async/>
        <jgroupsConfig>
         <TCP start_port="7800" bind_addr="172.16.15.17" loopback="false"/>
         <TCPPING timeout="3000"
         initial_hosts="172.16.15.17[7800],172.16.15.170[7800]"
           port_range="0" num_initial_members="2" />
         <MERGE2 max_interval="30000" min_interval="10000"/>
         <FD max_tries="5" shun="true" timeout="10000"/>
         <VERIFY_SUSPECT timeout="1500"/>
         <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"
          retransmit_timeout="300,600,1200,2400,4800" discard_delivered_msgs="true" />
         <pbcast.STABLE desired_avg_gossip="50000" max_bytes="400000" stability_delay="1000"/>
         <pbcast.GMS join_timeout="5000" print_local_addr="true" shun="false"
           view_ack_collection_timeout="5000" view_bundling="true"/>
         <FC max_credits="1000000" min_threshold="0.20"/>
         <FRAG2 frag_size="60000"/>
         <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
         <!--pbcast.STREAMING_STATE_TRANSFER
          socket_buffer_size="1048576" /-->
         <pbcast.FLUSH timeout="0"/>
        </jgroupsConfig>
      </clustering>

      <!-- FS: persist on FS -->
          <loaders>
              <loader
                      async="false"
                      purgeOnStartup="false"
          fetchPersistentState="true">
                  <properties>
                      location=/opt/wax-jbosscache/#MyLayoutDB
                  </properties>
              </loader>
          </loaders>
         
      </jbosscache>

       

       

      Problems start when I shut down one of the two nodes and then I restart it, because this node is unable to retreive the cache state from the other node of the cluster (I can see in the server logs the cluster configuration being updated with the new node but nothing works fine).

      That is the error I found calling the restarted node:

       

       

      at org.jboss.cache.remoting.jgroups.ChannelMessageListener.stateReceivingFailed(ChannelMessageListener.java:130)
           at org.jboss.cache.remoting.jgroups.ChannelMessageListener.setState(ChannelMessageListener.java:299)
           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:714)
           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:776)
           at org.jgroups.JChannel.up(JChannel.java:1226)
           at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:462)
           at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:443)
           at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:488)
           at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:453)
           at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:211)
           at org.jgroups.protocols.FRAG2.up(FRAG2.java:192)
           at org.jgroups.protocols.FC.up(FC.java:468)
           at org.jgroups.protocols.pbcast.GMS.up(GMS.java:796)
           at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)
           at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:616)
           at org.jgroups.protocols.UNICAST.up(UNICAST.java:282)
           at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:747)
           at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
           at org.jgroups.protocols.FD.up(FD.java:284)
           at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:308)
           at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
           at org.jgroups.protocols.Discovery.up(Discovery.java:263)
           at org.jgroups.protocols.PING.up(PING.java:270)
           at org.jgroups.protocols.TP.passMessageUp(TP.java:1277)
           at org.jgroups.protocols.TP.access$100(TP.java:49)
           at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1830)
           at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1804)
           at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690)
           at java.lang.Thread.run(Thread.java:797)
           Caused by: java.lang.NoClassDefFoundError: com.cadit.wax.ws.classes.Session (initialization failure)
           at java.lang.J9VMInternals.initialize(J9VMInternals.java:123)
           at java.io.ObjectStreamClass.hasStaticInitializer(Native Method)
           at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1723)
           at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:103)
           at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:235)
           at java.security.AccessController.doPrivileged(AccessController.java:192)
           at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:232)
           at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:597)
           at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1561)
           at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1475)
           at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1708)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1927)
           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1851)
           at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1728)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:354)
           at java.util.HashMap.readObject(HashMap.java:1068)
           at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:615)
           at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1001)
           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1828)
           at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1728)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:354)
           at org.jboss.cache.marshall.NodeData.readExternal(NodeData.java:112)
           at org.jboss.cache.marshall.CacheMarshaller200.unmarshallObject(CacheMarshaller200.java:522)
           at org.jboss.cache.marshall.CacheMarshaller200.populateFromStream(CacheMarshaller200.java:660)
           at org.jboss.cache.marshall.CacheMarshaller200.unmarshallLinkedList(CacheMarshaller200.java:611)
           at org.jboss.cache.marshall.CacheMarshaller200.unmarshallObject(CacheMarshaller200.java:486)
           at org.jboss.cache.marshall.CacheMarshaller200.unmarshallObject(CacheMarshaller200.java:433)
           at org.jboss.cache.marshall.CacheMarshaller200.objectFromObjectStream(CacheMarshaller200.java:141)
           at org.jboss.cache.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:360)
           at org.jboss.cache.statetransfer.DefaultStateTransferIntegrator.readNodesAsList(DefaultStateTransferIntegrator.java:249)
           at org.jboss.cache.statetransfer.DefaultStateTransferIntegrator.integrateTransientState(DefaultStateTransferIntegrator.java:210)
           at org.jboss.cache.statetransfer.DefaultStateTransferIntegrator.integrateTransientState(DefaultStateTransferIntegrator.java:108)
           at org.jboss.cache.statetransfer.DefaultStateTransferIntegrator.integrateState(DefaultStateTransferIntegrator.java:81)
           at org.jboss.cache.statetransfer.DefaultStateTransferManager.setState(DefaultStateTransferManager.java:191)
           at org.jboss.cache.statetransfer.DefaultStateTransferManager.setState(DefaultStateTransferManager.java:155)
           at org.jboss.cache.remoting.jgroups.ChannelMessageListener.setState(ChannelMessageListener.java:294)
           ... 28 more
           Caused by: java.lang.Throwable: org.jboss.cache.CacheException: java.lang.reflect.InvocationTargetException
           at org.jboss.cache.util.reflect.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:148)
           at org.jboss.cache.factories.ComponentRegistry$PrioritizedMethod.invoke(ComponentRegistry.java:883)
           at org.jboss.cache.factories.ComponentRegistry.internalStart(ComponentRegistry.java:680)
           at org.jboss.cache.factories.ComponentRegistry.start(ComponentRegistry.java:561)
           at org.jboss.cache.invocation.CacheInvocationDelegate.start(CacheInvocationDelegate.java:301)
           at org.jboss.cache.DefaultCacheFactory.createCache(DefaultCacheFactory.java:119)
           at org.jboss.cache.DefaultCacheFactory.createCache(DefaultCacheFactory.java:94)
           at org.jboss.cache.DefaultCacheFactory.createCache(DefaultCacheFactory.java:71)
           at it.cadit.wax.cache.jbosscache.WAXCacheFactory.<init>(WAXCacheFactory.java:35)
           at it.cadit.wax.cache.common.WAXCacheFactoryType$1.getCacheFactory(WAXCacheFactoryType.java:18)
           at com.cadit.wax.util.Utility.initializeCache(Utility.java:959)
           at com.cadit.wax.cache.Repository.initializeCache(Repository.java:26)
           at com.cadit.wax.ws.classes.Session.<clinit>(Session.java:30)
           at java.lang.J9VMInternals.initializeImpl(Native Method)
           at java.lang.J9VMInternals.initialize(J9VMInternals.java:177)
           at com.cadit.wax.ws.city.authws.AuthWSSoapBindingImpl.<clinit>(AuthWSSoapBindingImpl.java:292)
           at java.lang.J9VMInternals.initializeImpl(Native Method)
           at java.lang.J9VMInternals.initialize(J9VMInternals.java:177)
           at java.lang.Class.forNameImpl(Native Method)
           at java.lang.Class.forName(Class.java:131)
           at com.cadit.wax.ResourceManagerListener$Configuration.createObject(ResourceManagerListener.java:441)
           at com.cadit.wax.ResourceManagerListener$Configuration.createAuthService(ResourceManagerListener.java:397)
           at com.cadit.wax.ResourceManagerListener$Configuration.load(ResourceManagerListener.java:276)
           at com.cadit.wax.ResourceManagerListener.loadConfiguration(ResourceManagerListener.java:698)
           at com.cadit.wax.ResourceManagerListener.configure(ResourceManagerListener.java:690)
           at com.cadit.wax.ResourceManagerListener.contextInitialized(ResourceManagerListener.java:722)
           at com.ibm.ws.wswebcontainer.webapp.WebApp.notifyServletContextCreated(WebApp.java:605)
           at com.ibm.ws.webcontainer.webapp.WebApp.commonInitializationFinish(WebApp.java:265)
           at com.ibm.ws.wswebcontainer.webapp.WebApp.initialize(WebApp.java:271)
           at com.ibm.ws.wswebcontainer.webapp.WebGroup.addWebApplication(WebGroup.java:88)
           at com.ibm.ws.wswebcontainer.VirtualHost.addWebApplication(VirtualHost.java:157)
           at com.ibm.ws.wswebcontainer.WebContainer.addWebApp(WebContainer.java:653)
           at com.ibm.ws.wswebcontainer.WebContainer.addWebApplication(WebContainer.java:606)
           at com.ibm.ws.webcontainer.component.WebContainerImpl.install(WebContainerImpl.java:333)
           at com.ibm.ws.webcontainer.component.WebContainerImpl.start(WebContainerImpl.java:549)
           at com.ibm.ws.runtime.component.ApplicationMgrImpl.start(ApplicationMgrImpl.java:1295)
           at com.ibm.ws.runtime.component.DeployedApplicationImpl.fireDeployedObjectStart(DeployedApplicationImpl.java:1129)
           at com.ibm.ws.runtime.component.DeployedModuleImpl.start(DeployedModuleImpl.java:567)
           at com.ibm.ws.runtime.component.DeployedApplicationImpl.start(DeployedApplicationImpl.java:814)
           at com.ibm.ws.runtime.component.ApplicationMgrImpl.startApplication(ApplicationMgrImpl.java:948)
           at com.ibm.ws.runtime.component.ApplicationMgrImpl$AppInitializer.run(ApplicationMgrImpl.java:2114)
           at com.ibm.wsspi.runtime.component.WsComponentImpl$_AsynchInitializer.run(WsComponentImpl.java:340)
           at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1498)
           Caused by: java.lang.Throwable: java.lang.reflect.InvocationTargetException
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:615)
           at org.jboss.cache.util.reflect.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:144)
           ... 42 more
           Caused by: java.lang.Throwable: org.jboss.cache.CacheException: Unable to connect to JGroups channel
           at org.jboss.cache.RPCManagerImpl.start(RPCManagerImpl.java:252)
           ... 47 more
           Caused by: java.lang.Throwable: org.jgroups.StateTransferException: 172.16.15.170:51651 could not fetch state null from null
           at org.jgroups.JChannel.connect(JChannel.java:466)
           at org.jboss.cache.RPCManagerImpl.start(RPCManagerImpl.java:242)
           ... 47 more
           Caused by: java.lang.Throwable: org.jgroups.StateTransferException: 172.16.15.170:51651 could not fetch state null from null
           at org.jgroups.JChannel.connect(JChannel.java:459)
           ... 48 more

       

      I don't think the problem is the state retrieval timeout because is already big...

      A very important note is that if I use only one instance of cache (for example Layout) all works fine also when I shut down and restart a node!!!

      Could it be a deadlock in JGROUPS startup during initialization?

       

      Any suggestion?

      Thank's to all  !!!