6 Replies Latest reply on Jul 24, 2007 6:52 PM by konkimalla

    Problem with removing a Node from Cluster - HAJNDI dies

    redchili

      Hello,

      I have a problem when I remove a node from a cluster. The cluster consists of 2 or more identical configured nodes on which the application is deployed via farming.
      Adding nodes to the cluster works fine. As soon as the deployment is completed the JNDIView shows all registered beans and proxies:

      HA-JNDI Namespace
       +- QueueConnectionFactory
       +- XAConnectionFactory
       +- HTTPXAConnectionFactory
       +- queue
       | +- D
       | +- DLQ
       | +- C
       | +- ex
       | +- B
       | +- A
       | +- testQueue
       +- HTTPConnectionFactory
       +- UIL2XAConnectionFactory[link -> XAConnectionFactory]
       +- kusssdemo
       | +- TermPeriodEM
       | | +- local (proxy: $Proxy1060 implements interface at.jku......)
       | +- StudyCodeBusinessLogicBean
       | | +- remote (proxy: $Proxy934 implements interface at.jku.......)
       | +- StudyMajorFieldBusinessLogicBean
      


      Now when I shutdown a node in the cluster the proxies are removed from all nodes and not restarted again (thus leaving any application inoperable):

      HA-JNDI Namespace
       +- HTTPXAConnectionFactory
       +- XAConnectionFactory
       +- QueueConnectionFactory
       +- queue
       | +- D
       | +- C
       | +- DLQ
       | +- B
       | +- ex
       | +- A
       | +- testQueue
       +- HTTPConnectionFactory
       +- UIL2XAConnectionFactory[link -> XAConnectionFactory]
       +- kusssdemo
       | +- TermPeriodEM
       | +- StudyMajorFieldBusinessLogicBean
       | +- StudyCodeBusinessLogicBean
      


      The only related suspect thing I can see in the log of another (still alive) node is the "was NOT removed !!!" message:
      INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] New cluster view for partition DefaultPartition: 13 ([192.168.1.104:1099, 192.168.1.106:1099] delta: -1)
       DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] dead members: [192.168.1.105:1099]
       DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] membership changed from 2 to 2
       DEBUG [org.jgroups.protocols.pbcast.NAKACK] removing 192.168.1.105:7800 from received_msgs (not member anymore)
       DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Begin notifyListeners, viewID: 13
       DEBUG [org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [192.168.1.104:7800, 192.168.1.106:7800]
       INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] I am (192.168.1.106:1099) received membershipChanged event:
       INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] Dead members: 1 ([192.168.1.105:1099])
       INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] New Members : 0 ([])
       INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] All Members : 2 ([192.168.1.104:1099, 192.168.1.106:1099])
       DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] purgeDeadMembers, [192.168.1.105:1099]
       DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] trying to remove deadMember 192.168.1.105:1099 for key DCacheBridge-DefaultJGBridge
       DEBUG [org.jgroups.protocols.FD] suspected_mbrs: [], after adjustment: []
       DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] 192.168.1.105:1099 was NOT removed!!!
       DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] trying to remove deadMember 192.168.1.105:1099 for key jboss.ha:service=HASingletonDeployer
       DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] 192.168.1.105:1099 was NOT removed!!!
       DEBUG [org.jgroups.protocols.FD_SOCK] determinePingDest()=192.168.1.104:7800, pingable_mbrs=[192.168.1.104:7800, 192.168.1.106:7800]
       DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] trying to remove deadMember 192.168.1.105:1099 for key HAJNDI
       DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] 192.168.1.105:1099 was NOT removed!!!
       DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] End notifyListeners, viewID: 13


      As soon as I join any other node to the cluster again, everything continous to work fine.
      Does anybody have an idea what might be wrong here?