7 Replies Latest reply on Jan 8, 2010 6:32 AM by gaohoward

    Problem with cluster failover.

      Hi,

       

      I'm running crashtests of an JBoss cluster dedicated to be an JMS server.

      I wrote an simple (2 threaded) JMS client that connects to the server and simultaneously sends&receives uniq messages checking whether they are duplicated/lost etc...

      The scenario I'm currently stuck with is:

      - i put 2 server nodes up

      - i start the client (mentioned before)

      - i see the traffic

      - kill (sig kill) the 1st node

      - i still see the traffic after failover

      - kill (sig kill) the 2nd node

      - the traffic stops

      - after some time client starts to throw exceptions like:

      javax.jms.IllegalStateException: The object is closed
           at org.jboss.jms.client.container.ClosedInterceptor.invoke(ClosedInterceptor.java:157)
           at org.jboss.aop.advice.PerInstanceInterceptor.invoke(PerInstanceInterceptor.java:86)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.delegate.ClientConsumerDelegate.receive(ClientConsumerDelegate.java)
           at org.jboss.jms.client.JBossMessageConsumer.receive(JBossMessageConsumer.java:86)
           at auto.Main.receive(Main.java:92)
           at auto.Main.receiveWrap(Main.java:138)
           at auto.Main$2.run(Main.java:195)
           at java.lang.Thread.run(Thread.java:619)
      

      - and after some more time:

      javax.jms.JMSException: Maximum number of failover attempts exceeded. Cannot find a server to failover onto.
           at org.jboss.jms.client.container.ClusteringAspect.handleCreateConnectionDelegate(ClusteringAspect.java:234)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
           at java.lang.reflect.Method.invoke(Method.java:597)
           at org.jboss.aop.advice.PerInstanceAdvice.invoke(PerInstanceAdvice.java:122)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.delegate.ClientClusteredConnectionFactoryDelegate.createConnectionDelegate(ClientClusteredConnectionFactoryDelegate.java)
           at org.jboss.jms.client.JBossConnectionFactory.createConnectionInternal(JBossConnectionFactory.java:205)
           at org.jboss.jms.client.JBossConnectionFactory.createConnection(JBossConnectionFactory.java:87)
           at auto.Main.receive(Main.java:86)
           at auto.Main.receiveWrap(Main.java:138)
           at auto.Main$2.run(Main.java:195)
           at java.lang.Thread.run(Thread.java:619)
      

      - i put the 1st node up

      - client throws:

      org.jboss.jms.exception.MessagingShutdownException: Cannot handle invocation since messaging server is not active (it is either starting up or shutting down)
           at org.jboss.jms.server.remoting.JMSServerInvocationHandler.invoke(JMSServerInvocationHandler.java:133)
           at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:891)
           at org.jboss.remoting.transport.socket.ServerThread.completeInvocation(ServerThread.java:744)
           at org.jboss.remoting.transport.socket.ServerThread.processInvocation(ServerThread.java:697)
           at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:551)
           at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:232)
           at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:211)
           at org.jboss.remoting.Client.invoke(Client.java:1724)
           at org.jboss.remoting.Client.invoke(Client.java:629)
      (...)
      

      - after some time it throws:

      org.jboss.jms.exception.MessagingJMSException: Failed to invoke
           at org.jboss.jms.client.delegate.DelegateSupport.handleThrowable(DelegateSupport.java:271)
           at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate.org$jboss$jms$client$delegate$ClientConnectionFactoryDelegate$createConnectionDelegate$aop(ClientConnectionFactoryDelegate.java:191)
           at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate$createConnectionDelegate_N3019492359065420858.invokeTarget(ClientConnectionFactoryDelegate$createConnectionDelegate_N3019492359065420858.java)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:111)
           at org.jboss.jms.client.container.StateCreationAspect.handleCreateConnectionDelegate(StateCreationAspect.java:81)
           at org.jboss.aop.advice.org.jboss.jms.client.container.StateCreationAspect_z_handleCreateConnectionDelegate_26293492.invoke(StateCreationAspect_z_handleCreateConnectionDelegate_26293492.java)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate.createConnectionDelegate(ClientConnectionFactoryDelegate.java)
           at org.jboss.jms.client.container.ClusteringAspect.handleCreateConnectionDelegate(ClusteringAspect.java:134)
           at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
           at java.lang.reflect.Method.invoke(Method.java:597)
           at org.jboss.aop.advice.PerInstanceAdvice.invoke(PerInstanceAdvice.java:122)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.delegate.ClientClusteredConnectionFactoryDelegate.createConnectionDelegate(ClientClusteredConnectionFactoryDelegate.java)
           at org.jboss.jms.client.JBossConnectionFactory.createConnectionInternal(JBossConnectionFactory.java:205)
           at org.jboss.jms.client.JBossConnectionFactory.createConnection(JBossConnectionFactory.java:87)
      (...)
      

      - and finally the client start to listen again

       

      Everything looks fine after that.

      But all of the stacktraces come from the "receiver" thread. None of them come from "sender" thread.

      And after all that fatal failovers none message can be send.

       

      The "sender" thread is blocked. Here is the snapshoot from VisualVM:

      (...)
      "sender" prio=10 tid=0x8f62c800 nid=0x4a1a in Object.wait() [0x8f3c7000]
         java.lang.Thread.State: WAITING (on object monitor)
           at java.lang.Object.wait(Native Method)
           - waiting on <0x947601a0> (a org.jboss.jms.client.container.ClosedInterceptor)
           at java.lang.Object.wait(Object.java:485)
           at org.jboss.jms.client.container.ClosedInterceptor.checkCloseAlreadyDone(ClosedInterceptor.java:245)
           - locked <0x947601a0> (a org.jboss.jms.client.container.ClosedInterceptor)
           at org.jboss.jms.client.container.ClosedInterceptor.invoke(ClosedInterceptor.java:142)
           at org.jboss.aop.advice.PerInstanceInterceptor.invoke(PerInstanceInterceptor.java:86)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.delegate.ClientProducerDelegate.close(ClientProducerDelegate.java)
           at org.jboss.jms.client.container.ClosedInterceptor.maintainRelatives(ClosedInterceptor.java:306)
           at org.jboss.jms.client.container.ClosedInterceptor.invoke(ClosedInterceptor.java:165)
           at org.jboss.aop.advice.PerInstanceInterceptor.invoke(PerInstanceInterceptor.java:86)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.delegate.ClientSessionDelegate.closing(ClientSessionDelegate.java)
           at org.jboss.jms.client.container.ClosedInterceptor.maintainRelatives(ClosedInterceptor.java:305)
           at org.jboss.jms.client.container.ClosedInterceptor.invoke(ClosedInterceptor.java:165)
           at org.jboss.aop.advice.PerInstanceInterceptor.invoke(PerInstanceInterceptor.java:86)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.delegate.ClientConnectionDelegate.closing(ClientConnectionDelegate.java)
           at org.jboss.jms.client.FailoverCommandCenter.failureDetected(FailoverCommandCenter.java:208)
           at org.jboss.jms.client.container.FailoverValveInterceptor.invoke(FailoverValveInterceptor.java:124)
           at org.jboss.aop.advice.PerInstanceInterceptor.invoke(PerInstanceInterceptor.java:86)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.container.ClosedInterceptor.invoke(ClosedInterceptor.java:170)
           at org.jboss.aop.advice.PerInstanceInterceptor.invoke(PerInstanceInterceptor.java:86)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.delegate.ClientSessionDelegate.send(ClientSessionDelegate.java)
           at org.jboss.jms.client.container.ProducerAspect.handleSend(ProducerAspect.java:269)
           at org.jboss.aop.advice.org.jboss.jms.client.container.ProducerAspect_z_handleSend_26293492.invoke(ProducerAspect_z_handleSend_26293492.java)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.container.ClosedInterceptor.invoke(ClosedInterceptor.java:170)
           at org.jboss.aop.advice.PerInstanceInterceptor.invoke(PerInstanceInterceptor.java:86)
           at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
           at org.jboss.jms.client.delegate.ClientProducerDelegate.send(ClientProducerDelegate.java)
           at org.jboss.jms.client.JBossMessageProducer.send(JBossMessageProducer.java:164)
           at org.jboss.jms.client.JBossMessageProducer.send(JBossMessageProducer.java:207)
           at org.jboss.jms.client.JBossMessageProducer.send(JBossMessageProducer.java:145)
           at org.jboss.jms.client.JBossMessageProducer.send(JBossMessageProducer.java:136)
           at auto.Main.send(Main.java:45)
           at auto.Main.sendWrap(Main.java:70)
           at auto.Main$1.run(Main.java:189)
           at java.lang.Thread.run(Thread.java:619)
      
         Locked ownable synchronizers:
           - None
      (...)
      

       

      I'm using:

      Java JRE 6 (build 1.6.0_17-b04)

      JBossMessaging 1.4.3.GA

      JBoss 5.1.0.GA

       

      The client is ussing port 1099 to lookup the JNDI (is this significant?).

       

      The client code is extended version of this: http://pastebin.com/m410ce7d0 (the connection is created and maintainted in the very same way as in this example)

       

      Is it an configuration issue or jboss error?

        • 1. Re: Problem with cluster failover.
          gaohoward

          Hi,

           

          javax.jms.JMSException: Maximum number of failover attempts exceeded. Cannot find a server to failover onto.

          When this exception happens, how your client program deal with it? Did you catch the exception and do some retry? Or just the client exists and you manually start up it?

           

          Howard

          • 2. Re: Problem with cluster failover.

            Hi,

             

            When an Exception is thrown dealing with Connection/Session/Producer(in the "sender" thread)/Consumer(in the "receiver" thread) the connection is restarted completely. The Exception you quoted ("Maximum number of failover attempts exceeded.") has been thrown only by the "receiver" thread, I think that the "sender" was already blocked at that time (I can check it to be sure if it's revelant).

             

            This is how I handle Exceptions in "sender" ("receiver" is very simmilar, this is very dirty code..sorry):

                 public static void sendWrap(ConnectionFactory cf, Destination target) {
                      for(;;) {  // just loop forever
                           try {
                                Thread.sleep(1000);
                                send(cf, target);
                           } catch (Exception e) {
                                if(e instanceof InterruptedException) {
                                     LOGGER.info("exiting sender", e);
                                     return;
                                }
                                else {
                                     LOGGER.warn("send got exception", e);
                                }
                           }
                      }
                 }
            

            and the "send" method:

                 public static void send(ConnectionFactory cf, Destination target) throws Exception {
                      Connection connection = null;
                      try {
                           connection = cf.createConnection("ecm-user", "ecm-user");
                           Session session = connection.createSession(false, Session.CLIENT_ACKNOWLEDGE);
                           MessageProducer producer = session.createProducer(target);
                           connection.start();
                           LOGGER.info("Sending");
                           for(;;) {
                                // some uniq id generation was here
                                TextMessage message = session.createTextMessage(id + "~" + new Date() + "~" + GarbageGenerator.getGarbage());
                                message.setJMSDeliveryMode(DeliveryMode.PERSISTENT);
                                producer.send(message);
                                if(LOGGER.isTraceEnabled()) {
                                     LOGGER.trace("Sent message no: " + id + " id: " + message.getJMSMessageID() + " : " + message.getText());
                                }
                                else {
                                     LOGGER.info("Sent message no: " + id + " id: " + message.getJMSMessageID());
                                }
                                Thread.sleep(500);
                                if(Thread.interrupted()) {
                                     LOGGER.info("sender interrupted");
                                     throw new InterruptedException();
                                }
                           }
                      }
                      finally {
                           if(connection != null) {
                                connection.close();
                           }
                      }
                 }
            

             

            The "receiver" and "sender" threads are using two separate ConnectionFactories and Destinations. Those are taken from two separate JNDI lookups. That's because I needed a tool that can send&receive messages from different servers. This is the initialization code:

                           Properties senderProps = new Properties();
                           senderProps.put(Context.INITIAL_CONTEXT_FACTORY, "org.jnp.interfaces.NamingContextFactory");
                           senderProps.put(Context.URL_PKG_PREFIXES, "org.jboss.naming:org.jnp.interfaces");
                           senderProps.put(Context.PROVIDER_URL, senderJNP);
                           Properties receiverProps = new Properties();
                           receiverProps.put(Context.INITIAL_CONTEXT_FACTORY, "org.jnp.interfaces.NamingContextFactory");
                           receiverProps.put(Context.URL_PKG_PREFIXES, "org.jboss.naming:org.jnp.interfaces");
                           receiverProps.put(Context.PROVIDER_URL, receiverJNP);
            
                           // Step 1. Create an initial context to perform the JNDI lookup.
                           senderInitialContext = new InitialContext(senderProps);
                           receiverInitialContext = new InitialContext(receiverProps);
                           // Step 3. Perform a lookup on the Connection Factory
                           final ConnectionFactory senderConnectionFactory = (ConnectionFactory)senderInitialContext.lookup(senderCF);
                           final ConnectionFactory receiverConnectionFactory = (ConnectionFactory)senderInitialContext.lookup(receiverCF);
                           // Step 2. Perfom a lookup on the queue
                           final Destination senderTarget = (Destination)senderInitialContext.lookup(senderDest);
                           final Destination receiverTarget = (Destination)senderInitialContext.lookup(receiverDest);
            

             

            Same situation takes place when I'm using HA-JNDI for the lookup (port 1100).

             

            Krystian.

            • 3. Re: Problem with cluster failover.
              gaohoward

              Hi,

               

              Do you do a fresh lookup for ConnectionFactory and Destination each time both nodes are killed and restarted?

               

              Howard

              • 4. Re: Problem with cluster failover.

                No. I run the lookup only once.

                 


                I think that you are missing the point.

                 


                The "sender" thread never get's any Exception. It's just stuck at the send method (of session) invocation and hanging there forever.

                 

                So there is no way (no Exception) the "sender" can be noticed about the servers unavailability. So it does't try to close the connection and reopen it.

                 


                The "receiver" looks OK for me. It gets the Exception and reconnects. But it is an essential to separate the "sender" from "receiver", so I can't restart the "sender" from "receiver" Exception handler ("receiver" Exception handler is never invoked).

                • 5. Re: Problem with cluster failover.
                  gaohoward

                  Hi,

                   

                  I tried a simple sending example against a two node cluster and I got the exception (max failover reached).

                   

                  The difference of my example from yours is probably I don't create connection and session each time. Mine is like

                   

                  1. create connection

                  2. create session

                  3. create producer

                  4. for (..) { sending message; }

                   


                  When I killed the second node, exception comes out seconds after. I'll try yours and see if I can reproduce it.

                   

                  By the way, do you know where the sender code stuck? (the thread dump)

                   

                  Howard

                  • 6. Re: Problem with cluster failover.

                    Hi,

                     

                    What version of JBoss and JBoss Messaging you are using?

                    What jar's have you included to the class path of the client?

                     

                    Maybe it lies in configuration?

                     

                    The thread dump is in the first post of this thread (last box).

                    • 7. Re: Problem with cluster failover.
                      gaohoward

                      Sorry I didn't see that thread dump. I'm trying the latest build.

                       

                      I think you are having this problem:

                       

                      https://jira.jboss.org/jira/browse/JBMESSAGING-1743