1 2 Previous Next 17 Replies Latest reply on Nov 29, 2011 8:46 PM by alexc099

    Static cluster stops recieving messages

    alexc099

      While I'm fighting with the network people over UDP filtering I figured I should set up a static cluster so I can keep testing. My problem is that my cluster will receive messages for a while but will then just stop a couple of thousand messages in. I'm not seeing any error logging from HornetQ, my producer just starts waiting and my consumer gets nothing. Note that this only happens when I have static discovery set up. Broadcast discovery works just fine, other than the fact that my producer and consumer have to live on the same box because of my network issues.

       

      I've been looking at the clustered-static-discovery example config and for the life of me I don't see anything out of the ordinary.

       

      This is the config for server0:

       

      <configuration xmlns="urn:hornetq"

                     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                     xsi:schemaLocation="urn:hornetq /schema/hornetq-configuration.xsd">

       

         <clustered>true</clustered>

         <shared-store>true</shared-store>

         <persistence-enabled>true</persistence-enabled>

         <failover-on-shutdown>true</failover-on-shutdown>

         <allow-failback>true</allow-failback>

         <jmx-management-enabled>true</jmx-management-enabled>

         <journal-directory>/shared/hq/data/journal</journal-directory>

         <paging-directory>/shared/hq/data/paging</paging-directory>

         <bindings-directory>/shared/hq/data/bindings</bindings-directory>

         <journal-min-files>10</journal-min-files>  

         <large-messages-directory>/shared/hq/data/large-messages</large-messages-directory>

       

        <connectors>     

            <connector name="netty">

               <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

               <param key="host"  value="${hornetq.remoting.netty.host:qa29-vm}"/>

               <param key="port"  value="${hornetq.remoting.netty.port:5445}"/>

            </connector>

       

            <connector name="server1-connector">

               <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

               <param key="host"  value="qa30-vm"/>

               <param key="port"  value="5446"/>

            </connector>     

         </connectors>

       

         <acceptors>

            <acceptor name="netty">

               <factory-class>org.hornetq.core.remoting.impl.netty.NettyAcceptorFactory</factory-class>

               <param key="host"  value="${hornetq.remoting.netty.host:qa29-vm}"/>

               <param key="port"  value="${hornetq.remoting.netty.port:5445}"/>

            </acceptor>

         </acceptors>

       

       

         <cluster-connections>

            <cluster-connection name="my-cluster">

               <address>jms</address>

               <connector-ref>netty</connector-ref>

               <retry-interval>500</retry-interval>

               <use-duplicate-detection>true</use-duplicate-detection>

               <forward-when-no-consumers>true</forward-when-no-consumers>

               <max-hops>1</max-hops>

               <static-connectors>

                  <connector-ref>server1-connector</connector-ref>

               </static-connectors>

            </cluster-connection>

         </cluster-connections>

       

         <security-settings>

            <security-setting match="#">

               <permission type="createNonDurableQueue" roles="guest"/>

               <permission type="deleteNonDurableQueue" roles="guest"/>

               <permission type="consume" roles="guest"/>

               <permission type="send" roles="guest"/>

               <permission type="createDurableQueue" roles="guest"/>

            </security-setting>

         </security-settings>

       

         <address-settings>

            <!--default for catch all-->

            <address-setting match="#">

               <dead-letter-address>jms.queue.DLQ</dead-letter-address>

               <expiry-address>jms.queue.ExpiryQueue</expiry-address>

               <redelivery-delay>0</redelivery-delay>

               <max-size-bytes>10485760</max-size-bytes>      

               <message-counter-history-day-limit>10</message-counter-history-day-limit>

               <address-full-policy>BLOCK</address-full-policy>

            </address-setting>

         </address-settings>

      </configuration>

       

      Server1 has the same settings save host, port, a connector that points to server0, its corresponding reference in the cluster-connection and that its directories point to /shared/hq/data-2. Server3 and server4 (0 and 1's HA backup) have the <backup>true</backup> setting and I've tried them both with and without the cluster-connecton settings and even just not starting them none of that make a difference. My hornetq-jms.xml has this as a connection factory:

       

         <connection-factory name="NettyConnectionFactory">

            <xa>false</xa>

            <ha>true</ha>

            <connectors>

               <connector-ref connector-name="netty"/>

            </connectors>

            <entries>

               <entry name="/ConnectionFactory"/>

            </entries>

            <client-failure-check-period>60000</client-failure-check-period>

            <!-- Pause 1 second between connect attempts -->

            <retry-interval>1000</retry-interval>

       

            <!-- Multiply subsequent reconnect pauses by this multiplier. This can be used to

            implement an exponential back-off. For our purposes we just set to 1.0 so each reconnect

            pause is the same length -->

            <retry-interval-multiplier>1.0</retry-interval-multiplier>

       

            <!-- Try reconnecting an unlimited number of times (-1 means "unlimited") -->

            <reconnect-attempts>-1</reconnect-attempts>

         </connection-factory>

       

      My consumer uses Spring to get its connection factory with this config:

       

        <util:constant id="QUEUE_CF"

          static-field="org.hornetq.api.jms.JMSFactoryType.QUEUE_CF" />

       

        <bean name="transportConfiguration"

          class="org.hornetq.api.core.TransportConfiguration">

          <constructor-arg

            value="org.hornetq.core.remoting.impl.netty.NettyConnectorFactory" />

          <constructor-arg>

            <map key-type="java.lang.String" value-type="java.lang.Object">

              <entry key="host" value="qa29-vm" />

              <entry key="port" value="5445" />

            </map>

          </constructor-arg>

        </bean>

       

        <bean name="jmsFactory" class="org.hornetq.api.jms.HornetQJMSClient"

          factory-method="createConnectionFactoryWithHA">

          <constructor-arg index="0" ref="QUEUE_CF" />

          <constructor-arg index="1" ref="transportConfiguration" />

          <property name="reconnectAttempts" value="5"/>

          <property name="connectionTTL" value="60000"/>

          <property name="clientFailureCheckPeriod" value="30000"/>

        </bean>

       

      Edit: I also tried getting the connection factory through JNDI like so, but I'm still seeing the same issue. Connects, producer sends some messages then it all hangs.

       

         <bean id="jndiTemplate" class="org.springframework.jndi.JndiTemplate">

              <property name="environment">

                  <props>

                      <prop key="java.naming.factory.initial">org.jnp.interfaces.NamingContextFactory</prop>

                      <prop key="java.naming.provider.url">jnp://qa29-vm:1099</prop>

                  </props>

              </property>

          </bean>

       

          <bean id="jmsFactory" class="org.springframework.jndi.JndiObjectFactoryBean">

              <property name="jndiTemplate" ref="jndiTemplate"/>

              <property name="jndiName" value="ConnectionFactory"/>

          </bean>

       

      Finally, my producer (JMeter) uses JNDI to get ConnectionFactory from jnp://qa29-vm:1099 using the org.jnp.interfaces.NamingContextFactory.

       

      Like I said, the only difference between my working and non-working setup is the cluster discovery mode. Anyone got any ideas on what I messed up on?

        • 1. Re: Static cluster stops recieving messages
          clebert.suconic

          What version?

           

          Can you try downloading SVN for your test?

          • 2. Re: Static cluster stops recieving messages
            alexc099

            This is on 2.2.5.Final. SVN? You want me to grab HornetQ's source code to step into it? I can but it'll probably take me a bit to set up the environment.

            • 3. Re: Static cluster stops recieving messages
              alexc099

              I've tried playing with some other configs between setting up other stuff but the only things that works for me was to either turn on broadcast discovery (which I'd like to use but I'm getting push back from our IT guys) or turn off clustering completely and just hit a single instance. Both of those configs will run my 100,000 message test. Everything I could think of to turn down (number of queues, number of threads both turned down to 1) or off (like persistance and shared store) with static clustering on still gives me the same result of hanging after, at most, 3500 messages. I'll get back at it after our holiday break.

              • 4. Re: Static cluster stops recieving messages
                alexc099

                Anything further on this? Something in the config I missed or some tests that I can run? We'd really hate to take HornetQ out of the running on this but if the meeting with the network people goes the way I think it will then using the broadcast service will be off the table and if I can't define a cluster statically then I'll be out of luck.

                • 5. Re: Static cluster stops recieving messages
                  clebert.suconic

                  We have done some improvements recently on clustering, However this is not yet released as a standalone build. We are working on it.

                   

                  Maybe you could try this:

                   

                  svn co http://anonsvn.jboss.org/repos/hornetq/branches/Branch_2_2_AS7/

                   

                  ./build.sh distro

                   

                   

                  Can you give a try with this?

                  • 6. Re: Static cluster stops recieving messages
                    alexc099

                    Thanks Clebert! I'll build it and let you know.

                    • 7. Re: Static cluster stops recieving messages
                      alexc099

                      Well, that didn't do it but in the process I think I've tracked down the issue which I still think is configuration related on my end. I've noticed that when the producer and consumer stop they both do so at roughly the same time but not at the same count. Instead my producer (JMeter) will have sent X messages but my consumer will only have received exactly half of messages. This would seem to indicate that I'm not getting the connection factory correctly on the consumer side but I am on the producer side.

                       

                      In JMeter I'm giving the JMS sampler an inital context factory of org.jmp.interfaces.NamingContextFactory and a valid provider URL. (jmp://qa1-vm:1099) The consumer side is my own code and I'm using Spring to inject the connection factory. I've tried both getting it through JNDI and by creating a transport configuration and client directly.

                       

                      JNDI:

                      <bean id="jndiTemplate" class="org.springframework.jndi.JndiTemplate">

                           <property name="environment">

                                  <props>

                                      <prop key="java.naming.factory.initial">org.jnp.interfaces.NamingContextFactory</prop>

                                      <prop key="java.naming.provider.url">jnp://qa1-vm:1099</prop>

                                      <prop key="java.naming.factory.url.pkgs">org.jboss.naming:org.jnp.interfaces</prop>

                                  </props>

                           </property>

                      </bean>

                       

                      <bean id="jmsFactory" class="org.springframework.jndi.JndiObjectFactoryBean">

                              <property name="jndiTemplate" ref="jndiTemplate"/>

                              <property name="jndiName" value="ConnectionFactory"/>

                      </bean>

                       

                      Direct:

                      <util:constant id="QUEUE_CF"

                          static-field="org.hornetq.api.jms.JMSFactoryType.QUEUE_CF" />

                       

                      <bean name="transportConfiguration"

                          class="org.hornetq.api.core.TransportConfiguration">

                          <constructor-arg

                            value="org.hornetq.core.remoting.impl.netty.NettyConnectorFactory" />

                          <constructor-arg>

                            <map key-type="java.lang.String" value-type="java.lang.Object">

                              <entry key="host" value="qa1-vm" />

                              <entry key="port" value="5445" />

                            </map>

                          </constructor-arg>

                      </bean>

                       

                       

                      <bean name="jmsFactory" class="org.hornetq.api.jms.HornetQJMSClient"

                          factory-method="createConnectionFactoryWithoutHA">

                          <constructor-arg index="1" ref="transportConfiguration" />

                          <constructor-arg index="0" ref="QUEUE_CF" />

                          <property name="reconnectAttempts" value="5"/>

                          <property name="connectionTTL" value="60000"/>

                          <property name="clientFailureCheckPeriod" value="30000"/>

                      </bean>

                       

                      Do I have to somehow specify all the machines in the provider URL? Do I have to reference the cluster in the hornetq-jms.xml connection factory?

                      • 8. Re: Static cluster stops recieving messages
                        clebert.suconic

                        The hanging on producing:

                         

                         

                              <!--default for catch all-->

                              <address-setting match="#">

                                 <dead-letter-address>jms.queue.DLQ</dead-letter-address>

                                 <expiry-address>jms.queue.ExpiryQueue</expiry-address>

                                 <redelivery-delay>0</redelivery-delay>

                                 <max-size-bytes>10485760</max-size-bytes>     

                                 <message-counter-history-day-limit>10</message-counter-history-day-limit>

                                 <address-full-policy>BLOCK</address-full-policy>

                              </address-setting>

                         

                         

                         

                        You have set the Address to block, so you won't be able to produce any more messages until you consume (and ack) some messages.

                         

                        You could maybe try paging.

                         

                         

                        I'm not sure how you are connecting to the system, but you could maybe try connecting directly on spring. I could always do stuff on spring when I needed but I'm not versable on Spring, so I'm not sure about specifics on spring. Maybe you could try some examples and see what's going on.

                        • 9. Re: Static cluster stops recieving messages
                          alexc099

                          The producer being blocked makes sense but what I'm still not understanding is why the consumer isn't consuming. Remember, the consumer only gets half the messages that the producer sends. (This is on a two instance cluster. I'd be willing to bet that it'd drop to a third if I added another instance.) It's like my consumer is only receiving messages from one instance on the cluster. (I did a kill -3 on my consumer and the thread parks after it stops receiving messages.)

                           

                          Also, keep in mind that if I use broadcast discovery this issue doesn't happen at all. Same code, different settings in hornetq-jms and hornetq-config. That's ultimately why I'm thinking it's got something to do with how I'm getting the connection factory. I'm just at a loss to see what it is I'm doing wrong.

                          • 10. Re: Static cluster stops recieving messages
                            alexc099

                            That reminds me of a question I've had about TransportConfiguration since the beginning: is there a way to specifiy more than one host/port? Should I be including all the machines in the static cluster in the TransportConfiguration? If so, how? All the examples and docs I've found show only one, which made me think that was how it works.

                            • 11. Re: Static cluster stops recieving messages
                              clebert.suconic

                              The server should push the topology back to the client.

                               

                              You can specify multiple initial connectors, but once connected the server will push the topology down to the clients.

                               

                              Can you try to replicate your issue with an example so we can see what's going on?

                              • 12. Re: Static cluster stops recieving messages
                                clebert.suconic

                                Isn't this suppsoed to be 5445? (a typo?)

                                 

                                      <connector name="server1-connector">

                                         <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

                                         <param key="host"  value="qa30-vm"/>

                                         <param key="port"  value="5446"/> <<<<<<<<< here

                                      </connector>    


                                • 13. Re: Static cluster stops recieving messages
                                  alexc099

                                  Amazingly, that's not a typo. I originally did everything on one machine and got used to thinking of different instances on different ports.

                                   

                                  The example works but I'm not entirely convinced that it's doing what you want it to. It creates four sessions to the cluster, creates a producer on one of the sessions then creates four consumers on all the sessions and has the consumers receiving messages in a round robbin.

                                   

                                  My problem was that one consumer would only get half the messages off the queue. To see if I could reproduce this issue in the example I commented out three of the four consumers and changed the for loop on line 144 so it would iterate through all the messages. (The modified example code is attached.) In this case what I would expect, and correct me if I'm wrong, is that the single consumer would still receive all twenty messages. Instead it only received five. Am I not understaning something here?

                                  • 14. Re: Static cluster stops recieving messages
                                    clebert.suconic

                                    I bet there's some confusion on these ports..

                                     

                                    You need to enable redistribution-delay to have messages being redistributed in case there's no consumer.

                                    1 of 1 people found this helpful
                                    1 2 Previous Next