1 Reply Latest reply on Jan 4, 2013 9:46 AM by deepaksrivastavaz

    JGroups exception on concurrent shutdown of cluster

    deepaksrivastavaz

      Hi all,

       

      I recently got involved in Infinispan library usage, while playing around it I noticed that concurrent shutdown request of multiple nodes at once cause JGroups (random)exception which eventually do not let the applications stop on all nodes waiting like for ever and its always and easily reproducible.

       

      Environment details:

       

      JDK: 1.6.0_32

      Tomcat: apache-tomcat-6.0.28

      JGroups protocol stack:

                a.) TCP+TCPPING+MERGE2+FD_SOCK+FD+VERIFY_SUSPECT....

                b.) UDP+PING+MERGE2+FD_SOCK+FD_ALL+VERIFY_SUSPECT....

      ISPN version: Infinispan 'Delirium' 5.2.0.Alpha4

      JGroups version:jgroups-3.2.0.Alpha1

      Nodes: 3

      Location: Single Machine(A local laptop hosting Windows 7)

       

      Interestingly the NoClassDefFoundError exception being thrown is not fixed at a point but outputs different messages randomly during different invocations:

       

      LOG excerpt:

       

      Excerpt 1 from Run1:

      (Pasting initial lines separately as well as final posting seems rendering the message differently)

      [WARN ] [2012-11-17 18:35:00,467] [main] [protocols.TCP] - [failed to join /ff0e:0:0:0:0:0:75:75:7500 on ppp2: java.net.SocketException: error setting options]

      [ERROR] [] [Incoming-4,SampleCluster,DKUMAR-DL-IN-27445] [protocols.UNICAST] - [couldn't deliver message [dst: DKUMAR-DL-IN-27445, src: DKUMAR-DL-IN-50134 (3 headers), size=764 bytes]]

      java.lang.NullPointerException

          at org.jgroups.logging.Log4JLogImpl.isTraceEnabled(Log4JLogImpl.java:48)

       

      [WARN ] [2012-11-17 18:35:00,467] [main] [protocols.TCP] - [failed to join /ff0e:0:0:0:0:0:75:75:7500 on ppp2: java.net.SocketException: error setting options]

      [ERROR] [] [Incoming-4,SampleCluster,DKUMAR-DL-IN-27445] [protocols.UNICAST] - [couldn't deliver message [dst: DKUMAR-DL-IN-27445, src: DKUMAR-DL-IN-50134 (3 headers), size=764 bytes]]

      java.lang.NullPointerException

          at org.jgroups.logging.Log4JLogImpl.isTraceEnabled(Log4JLogImpl.java:48)

          at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:474)

          at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:387)

          at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)

          at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:609)

          at org.jgroups.JChannel.up(JChannel.java:715)

          at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)

          at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:178)

          at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)

          at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)

          at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)

          at org.jgroups.protocols.pbcast.GMS.up(GMS.java:899)

          at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)

          at org.jgroups.protocols.UNICAST.removeAndDeliver(UNICAST.java:651)

          at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:627)

          at org.jgroups.protocols.UNICAST.up(UNICAST.java:399)

          at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:572)

          at org.jgroups.protocols.BARRIER.up(BARRIER.java:102)

          at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)

          at org.jgroups.protocols.FD.up(FD.java:273)

          at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)

          at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)

          at org.jgroups.protocols.Discovery.up(Discovery.java:359)

          at org.jgroups.protocols.TP.passMessageUp(TP.java:1210)

          at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1774)

          at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1747)

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

          at java.lang.Thread.run(Thread.java:662)

       

      Exceprt 2 from Run1:

       

      [ERROR] [] [OOB-3,SampleCluster,DKUMAR-DL-IN-27445] [protocols.TCP] - [failed handling incoming message]

      java.lang.NoClassDefFoundError: org/jgroups/protocols/VERIFY_SUSPECT$Entry

          at org.jgroups.protocols.VERIFY_SUSPECT.addSuspect(VERIFY_SUSPECT.java:243)

          at org.jgroups.protocols.VERIFY_SUSPECT.verifySuspect(VERIFY_SUSPECT.java:192)

          at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:100)

          at org.jgroups.protocols.FD.up(FD.java:273)

          at org.jgroups.protocols.FD_SOCK.suspect(FD_SOCK.java:467)

          at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:212)

          at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)

          at org.jgroups.protocols.Discovery.up(Discovery.java:359)

          at org.jgroups.protocols.TP.passMessageUp(TP.java:1210)

          at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1774)

          at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1747)

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

          at java.lang.Thread.run(Thread.java:662)

      Caused by: java.lang.ClassNotFoundException: org.jgroups.protocols.VERIFY_SUSPECT$Entry

          at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1645)

          at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491)

          ... 14 more

       

      Excerpt 3 from Run 1:

       

      [ERROR] [] [Incoming-4,SampleCluster,DKUMAR-DL-IN-27445] [protocols.UNICAST] - [couldn't deliver message [dst: DKUMAR-DL-IN-27445, src: DKUMAR-DL-IN-50134 (3 headers), size=142 bytes]]

      java.lang.NullPointerException

          at org.jgroups.logging.Log4JLogImpl.isTraceEnabled(Log4JLogImpl.java:48)

          at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:474)

          at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:387)

          at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)

          at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:609)

          at org.jgroups.JChannel.up(JChannel.java:715)

          at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)

          at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:178)

          at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)

          at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)

          at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)

          at org.jgroups.protocols.pbcast.GMS.up(GMS.java:899)

          at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)

          at org.jgroups.protocols.UNICAST.removeAndDeliver(UNICAST.java:651)

          at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:627)

          at org.jgroups.protocols.UNICAST.up(UNICAST.java:399)

          at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:572)

          at org.jgroups.protocols.BARRIER.up(BARRIER.java:102)

          at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)

          at org.jgroups.protocols.FD.up(FD.java:273)

          at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)

          at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)

          at org.jgroups.protocols.Discovery.up(Discovery.java:359)

          at org.jgroups.protocols.TP.passMessageUp(TP.java:1210)

          at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1774)

          at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1747)

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

          at java.lang.Thread.run(Thread.java:662)

       

      Apart from Log4JLogImpl issue(I am using log4j-1.2.15.jar) it seems like there is a class unloading issue as well with server as above Except 2 outputs different classes randomly as reason of failure.

       

      Do people do concurrent shutdown of their servers(tomcat or other) in real scenarios without any data-loss or other messages loss or is it preferred or advisable to do serial shutdown always...?

      However concurrent startup doesn't create such problems.

       

      Any help will be appreciated, let me know if you need more details around the environment?

       

      --Deepak