5 Replies Latest reply on Mar 1, 2011 10:22 AM by wdfink

    JBoss instance halts in cluster

    irfanmasood_015

      Hi,

       

      Can some one let me know what is the procedure for starting and stopping JBoss instances in a cluster?. Last week I faced this issue, the two nodes in cluster were working fine for a long time, we applied a patch having change in a .class file. We restarted master node after applying patch but it stopped receiving live traffic, then we stopped child node to apply patch; as child stopped; the master node started receiving traffic, later on when we started child node, then child node went into halted state i.e. Then we stopped both nodes and started both at the same time, but still one node was receiving traffic but other did not.

       

      Please help me out on what is the valid procedure to start nodes in a JBoss cluster. In case we need to apply a patch having changes in binaries, how we stop the instances and start again.

       

      Please note that we are not using Form Deployment, we have deployed our application in exploded .ear form in the JBOSS_HOME/server/all/deploy directory. The JBoss 3.2.5 is running on Windows 2008 Server and JBoss is installed as Windows service using Java Wrapper. The JDK is 1.4.2.

       

      Any help will be highly appreciated.

       

      Best Regards!

       

      Irfan

        • 1. JBoss instance halts in cluster
          wdfink

          Do you check Multicast?

          see http://community.jboss.org/wiki/JGroups

          Special http://community.jboss.org/wiki/TestingJBoss

           

          Do the cluster find together?

          Which instance is configured within the client for lookup?

          Did the behaviour change when you change the client config of JNP?

          • 2. JBoss instance halts in cluster
            irfanmasood_015

            Thanks for your answer.

             

            Please note that after making few tries to restart both instances at the same time, by luck, it worked and both instances start receiving traffic. So there seems to be some specific procedure on how to start / restart instances.

             

            I mean its behaviour is un predictable, some times it start correclty and both instances receive traffic and some times one of the instances halts.

             

            please help me out. thanks

             

            Irfan.

            • 3. JBoss instance halts in cluster
              irfanmasood_015

              Below are logs when it was woribng fine: (Instance 2 logs)

               

              2011-02-15 00:00:01,501 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to sdes-app01:60528 (additional data: 16 bytes) (own address=SDES-APP02:55268 (additional data: 16 bytes))

              2011-02-15 00:00:01,501 DEBUG [org.jgroups.protocols.UDP] sending message to sdes-app01:60528 (additional data: 16 bytes) (src=SDES-APP02:55268 (additional data: 16 bytes)), headers are {FD=[FD: heartbeat], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-15 00:00:01,501 DEBUG [org.jgroups.protocols.UDP] received (ucast) 166 bytes from /10.3.21.111:60528

              2011-02-15 00:00:01,501 DEBUG [org.jgroups.protocols.UDP] message is [dst: SDES-APP02:55268 (additional data: 16 bytes), src: sdes-app01:60528 (additional data: 16 bytes) (2 headers), size = 0 bytes], headers are {FD=[FD: heartbeat ack], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-15 00:00:01,501 DEBUG [org.jgroups.protocols.FD] received ack from sdes-app01:60528 (additional data: 16 bytes)

              2011-02-15 00:00:01,564 DEBUG [org.jgroups.protocols.UDP] received (ucast) 133 bytes from /10.3.21.111:60528

              2011-02-15 00:00:01,564 DEBUG [org.jgroups.protocols.UDP] message is [dst: SDES-APP02:55268 (additional data: 16 bytes), src: sdes-app01:60528 (additional data: 16 bytes) (2 headers), size = 0 bytes], headers are {FD=[FD: heartbeat], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-15 00:00:01,564 DEBUG [org.jgroups.protocols.UDP] sending message to sdes-app01:60528 (additional data: 16 bytes) (src=SDES-APP02:55268 (additional data: 16 bytes)), headers are {FD=[FD: heartbeat ack], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-15 00:00:04,013 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to sdes-app01:60528 (additional data: 16 bytes) (own address=SDES-APP02:55268 (additional data: 16 bytes))

              2011-02-15 00:00:04,013 DEBUG [org.jgroups.protocols.UDP] sending message to sdes-app01:60528 (additional data: 16 bytes) (src=SDES-APP02:55268 (additional data: 16 bytes)), headers are {FD=[FD: heartbeat], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-15 00:00:04,013 DEBUG [org.jgroups.protocols.UDP] received (ucast) 166 bytes from /10.3.21.111:60528

              2011-02-15 00:00:04,013 DEBUG [org.jgroups.protocols.UDP] message is [dst: SDES-APP02:55268 (additional data: 16 bytes), src: sdes-app01:60528 (additional data: 16 bytes) (2 headers), size = 0 bytes], headers are {FD=[FD: heartbeat ack], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-15 00:00:04,013 DEBUG [org.jgroups.protocols.FD] received ack from sdes-app01:60528 (additional data: 16 bytes)

              2011-02-15 00:00:04,060 DEBUG [org.jgroups.protocols.UDP] received (ucast) 133 bytes from /10.3.21.111:60528

               

              After when it did not worked in cluster (instance 1)

               

              2011-02-22 23:58:44,956 DEBUG [org.jgroups.protocols.MERGE2] initial_mbrs=[[own_addr=SDES-APP01:63710 (additional data: 16 bytes), coord_addr=SDES-APP01:63710 (additional data: 16 bytes)]]

              2011-02-22 23:58:59,969 DEBUG [org.jgroups.protocols.PING] FIND_INITIAL_MBRS

              2011-02-22 23:58:59,969 DEBUG [org.jgroups.protocols.PING] waiting for initial members: time_to_wait=2000, got 0 rsps

              2011-02-22 23:58:59,969 DEBUG [org.jgroups.protocols.UDP] sending message to 228.1.2.3:45566 (src=SDES-APP01:63710 (additional data: 16 bytes)), headers are {PING=[PING: type=GET_MBRS_REQ, arg=null], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-22 23:58:59,969 DEBUG [org.jgroups.protocols.UDP] received (mcast) 120 bytes from /10.3.21.111:63710 (size=120 bytes)

              2011-02-22 23:58:59,969 DEBUG [org.jgroups.protocols.UDP] message is [dst: 228.1.2.3:45566, src: SDES-APP01:63710 (additional data: 16 bytes) (2 headers), size = 0 bytes], headers are {PING=[PING: type=GET_MBRS_REQ, arg=null], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-22 23:58:59,969 DEBUG [org.jgroups.protocols.PING] received GET_MBRS_REQ from SDES-APP01:63710 (additional data: 16 bytes), returning [PING: type=GET_MBRS_RSP, arg=[own_addr=SDES-APP01:63710 (additional data: 16 bytes), coord_addr=SDES-APP01:63710 (additional data: 16 bytes)]]

              2011-02-22 23:58:59,985 DEBUG [org.jgroups.protocols.UDP] sending message to SDES-APP01:63710 (additional data: 16 bytes) (src=SDES-APP01:63710 (additional data: 16 bytes)), headers are {PING=[PING: type=GET_MBRS_RSP, arg=[own_addr=SDES-APP01:63710 (additional data: 16 bytes), coord_addr=SDES-APP01:63710 (additional data: 16 bytes)]], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-22 23:58:59,985 DEBUG [org.jgroups.protocols.UDP] received (ucast) 314 bytes from /10.3.21.111:63710

              2011-02-22 23:58:59,985 DEBUG [org.jgroups.protocols.UDP] message is [dst: SDES-APP01:63710 (additional data: 16 bytes), src: SDES-APP01:63710 (additional data: 16 bytes) (2 headers), size = 0 bytes], headers are {PING=[PING: type=GET_MBRS_RSP, arg=[own_addr=SDES-APP01:63710 (additional data: 16 bytes), coord_addr=SDES-APP01:63710 (additional data: 16 bytes)]], UDP=[UDP:group_addr=DefaultPartition]}

              2011-02-22 23:58:59,985 DEBUG [org.jgroups.protocols.PING] received FIND_INITAL_MBRS_RSP, rsp=[own_addr=SDES-APP01:63710 (additional data: 16 bytes), coord_addr=SDES-APP01:63710 (additional data: 16 bytes)]

              2011-02-22 23:58:59,985 DEBUG [org.jgroups.protocols.PING] waiting for initial members: time_to_wait=1984, got 1 rsps

              2011-02-22 23:59:01,997 DEBUG [org.jgroups.protocols.PING] initial mbrs are [[own_addr=SDES-APP01:63710 (additional data: 16 bytes), coord_addr=SDES-APP01:63710 (additional data: 16 bytes)]]

              2011-02-22 23:59:01,997 DEBUG [org.jgroups.protocols.MERGE2] initial_mbrs=[[own_addr=SDES-APP01:63710 (additional data: 16 bytes), coord_addr=SDES-APP01:63710 (additional data: 16 bytes)]]

              2011-02-22 23:59:21,922 DEBUG [org.jgroups.protocols.PING] FIND_INITIAL_MBRS

               

              Any idea some thing wrong in the logs.

               

              Best Regards!

               

              Irfan

              • 4. JBoss instance halts in cluster
                irfanmasood_015

                By using the sepcial test given below:

                 

                Special http://community.jboss.org/wiki/TestingJBoss

                 

                The nodes in cluster are not discovering each other, on both nodes the view list only one node i.e. itself.

                 

                Any idea what could be wrong, where to see for possible problems.

                • 5. JBoss instance halts in cluster
                  wdfink

                  Do you finish this thread, because of the started thread : http://http://community.jboss.org/thread/163240