10 Replies Latest reply on Feb 13, 2012 10:35 PM by meenarc

    HornetQ Bridge issues-remote server/backup-HA config

    meenarc

      Hi,

       

      We have the following setup(all versions are HornetQ2.2.5.Final)

       

       

      -Node A (Jboss4.3 EAP) has a local queue that has a configuration where all messages that arrive on the local queue are forwarded to a remote queue via a bridge.

      This bridge has been setup to have a discovery group .

      -Node B (Jboss 5.0.1.GA)is the main "live" remote hornetq where I have the broadcast/discovery group and cluster defintion setup. This also has a "shared" store with Node C which is the backup node

      -Node C (Jboss 5.0.1.GA )is the backup node which also has the broadcast/discovery group and cluster defintion setup and alsp accesses the "shared" store.

       

      The shared store is an NFS(coraid) mount

       

      Node A started up and the bridge connected to Node B. Then Node C was started up. I saw the following message continuously written across all 3 logs "

      2012-01-09 11:24:13,578 WARN  [org.hornetq.core.cluster.impl.DiscoveryGroupImpl]   There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=36467e3e-37c5-11e1-866c-000c29b96655".

       

      Also when I brought down Node B , the bridge did not automatically reconnect to the backup server and forward the messages.

      I have attached the hornetq-configuration.xml across all 3 nodes.

        • 1. Re: HornetQ Bridge issues-remote server/backup-HA config
          ataylor

          Node A started up and the bridge connected to Node B. Then Node C was started up. I saw the following message continuously written across all 3 logs "

          2012-01-09 11:24:13,578 WARN  [org.hornetq.core.cluster.impl.DiscoveryGroupImpl]   There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=36467e3e-37c5-11e1-866c-000c29b96655".

          this is fine

           

          first i would check that the backup is announced, there should be something in the logs of the backup. you can also check the jmx console

          • 2. Re: HornetQ Bridge issues-remote server/backup-HA config
            meenarc

            When Node C (the back up ) gets started up I see this goin up the logs

             

             

             

            "

            2012-01-09 12:06:54,846 INFO  [org.hornetq.core.server.cluster.impl.ClusterManagerImpl] (Thread-6) announcing backup

            2012-01-09 12:06:54,850 DEBUG [org.hornetq.core.client.impl.Topology] (Thread-6) adding = 36467e3e-37c5-11e1-866c-000c29b96655:Pair[a=null, b=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-63-20-233]

            2012-01-09 12:06:54,850 DEBUG [org.hornetq.core.client.impl.Topology] (Thread-6) before----------------------------------

            2012-01-09 12:06:54,850 DEBUG [org.hornetq.core.client.impl.Topology] (Thread-6)     nodes=0    members=0

            2012-01-09 12:06:54,851 DEBUG [org.hornetq.core.client.impl.Topology] (Thread-6) Topology updated=true

            2012-01-09 12:06:54,851 DEBUG [org.hornetq.core.client.impl.Topology] (Thread-6)     36467e3e-37c5-11e1-866c-000c29b96655 => TopologyMember[connector=Pair[a=null, b=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-63-20-233]]

                nodes=1    members=1"

             

            Also I see the following in Node B(live server logs)

             

            "

            2012-01-09 12:06:55,923 DEBUG [org.hornetq.core.client.impl.Topology] (Old I/O server worker (parentId: 328066535, [id: 0x138de5e7, /10.63.20.219:5445])) adding = 36467e3e-37c5-11e1-866c-000c29b96655:Pair[a=null, b=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-63-20-233]

            2012-01-09 12:06:55,923 DEBUG [org.hornetq.core.client.impl.Topology] (Old I/O server worker (parentId: 328066535, [id: 0x138de5e7, /10.63.20.219:5445])) before----------------------------------

            2012-01-09 12:06:55,923 DEBUG [org.hornetq.core.client.impl.Topology] (Old I/O server worker (parentId: 328066535, [id: 0x138de5e7, /10.63.20.219:5445]))     36467e3e-37c5-11e1-866c-000c29b96655 => TopologyMember[connector=Pair[a=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-63-20-219, b=null]]

                nodes=1    members=1

            2012-01-09 12:06:55,923 DEBUG [org.hornetq.core.client.impl.Topology] (Old I/O server worker (parentId: 328066535, [id: 0x138de5e7, /10.63.20.219:5445])) Topology updated=true

            2012-01-09 12:06:55,923 DEBUG [org.hornetq.core.client.impl.Topology] (Old I/O server worker (parentId: 328066535, [id: 0x138de5e7, /10.63.20.219:5445]))     36467e3e-37c5-11e1-866c-000c29b96655 => TopologyMember[connector=Pair[a=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-63-"

             

            So does this mean the backup has been announced correctly?

            • 3. Re: HornetQ Bridge issues-remote server/backup-HA config
              ataylor

              no there should be 2 members, also you will se a meessage something like "backup announced" on node C. if this isnt happening it could be a UDP issue since your using discovery. Is it enabled on your network and are any of your machines on the same machine (i,e, loopback isnt enabled)

              • 4. Re: HornetQ Bridge issues-remote server/backup-HA config
                meenarc

                I do see this on Node C(the backup one)

                 

                "2012-01-09 12:06:54,846 INFO  [org.hornetq.core.server.cluster.impl.ClusterManagerImpl] (Thread-6) announcing backup".

                 

                UDP is enabled on our network. We use Jboss clustering and that uses UDP

                • 5. Re: HornetQ Bridge issues-remote server/backup-HA config
                  meenarc

                  I did get the HA to work. What was missing was that I had to set up "ha" to "true" for the connection factory in my hornetq-jms.xml.

                   

                  I have been trying the failback . I set <allow-failback> to true in both my live/back up nodes. However the failback didn't happen to the live server when it came up. Via the jmx-console on live server I found the message count stayed the same whereas as messages were added the backup server jmx-conosle showed the message count going up. Is this something to do with my bridge  config that forwards stuff to the live/back up server that is preventing failback ?Or is the problem elsewhere

                   

                   

                   

                  The bridge on Node A uses discovery(since static connectors don't work for HA in 2.2.5).  I tried to mark the bridge "ha" (as shown below) and ran into parse exceptions. So I had to remove the ha property.What else needs to be done to get automatic failback working?

                   

                    <bridges>

                                  <bridge name="angel-logout-bridge">

                                                  <queue-name>jms.queue.angel-localLogoutQueue</queue-name>

                                                  <forwarding-address>jms.queue.angel-logoutQueue</forwarding-address>

                                                  <retry-interval>60000</retry-interval>

                  <ha>true</ha>

                                                  <retry-interval-multiplier>1</retry-interval-multiplier>

                                                  <reconnect-attempts>-1</reconnect-attempts>

                                                  <failover-on-server-shutdown>true</failover-on-server-shutdown>

                                                  <use-duplicate-detection>true</use-duplicate-detection>

                                                  <confirmation-window-size>10000000</confirmation-window-size>

                              <discovery-group-ref discovery-group-name="dg-group1" />

                          </bridge>

                  • 6. Re: HornetQ Bridge issues-remote server/backup-HA config
                    meenarc

                    Hi,

                     

                    I noticed the jmx-conosle of my backup where the backup is set to "true" initially and changes to "false" when I kill the live server.

                     

                    Simultaneously my bridge which connets to the live/backup starts throwing exceptions and is not able to fail over

                    Unable to send message, will try again once bridge reconnects

                    HornetQException[errorCode=3 message=Timed out waiting for response when sending packet 71]

                              at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:276)

                              at org.hornetq.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:287)

                              at org.hornetq.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:142)

                     

                    Loop back interfaces are available on all 3 nodes (verified via ifconfig settings and also by running the examples that use discovery and the MDB thta consumes these messages sent via bridge uses discovery and that works too). Noticed on the node that hosts the bridge that messages keep accumulating on the local node and don't get forwarded to the remote after kiling the live server.

                     

                    Also ensured this time around that all 3 nodes are on 3 different machines with different IP's

                    Is there some other connection-ttl or some such attributes that needs to be made available on the bridge?

                    Setting the bridge "ha" attribute to true throws exceptions.

                     

                    What else needs to be done to make the bridge failover?

                     

                     

                    Somewhat related q-When would hornetq-2.2.8 where HA with bridge and static connectors option be available?

                     

                    Thanks

                    • 7. Re: HornetQ Bridge issues-remote server/backup-HA config
                      ataylor

                      if everything is configured correctly it should just work, do you see 'backup announced' in your backup servers logs when it starts.

                       

                      post your updated configs and i will take a look

                      • 8. Re: HornetQ Bridge issues-remote server/backup-HA config
                        meenarc

                        In the backup server.log I see "[org.hornetq.core.server.cluster.impl.ClusterManagerImpl] (Thread-6) announcing backup" and "and "[org.hornetq.core.server.cluster.impl.ClusterManagerImpl] (Thread-0 (group:HornetQ-server-threads1593265793-364656927)) backup announced"  .Also some kind of topology updates related logs showed up when the backup server came up(have atttached these logs for both live/backup)

                         

                        Attached are the zips of the hornet configs and server logs for the the bridge, the live and backup servers. I ran a client progam to keep sending messages to a queue on the bridge server.

                        I can see the messages being forwarded to the live server(via jmx console of live) and at this point of time the backup server jmx console show backup as "True".

                        And then I kill the live server(via a command line kill -9 <pid>) and then both the client program and the server.log of the live server show up with . .

                        "

                        javax.jms.JMSException: Timed out waiting for response when sending packet 71

                            at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:276)"

                         

                        Also at this point the jmx console of the backup now shows "False"

                        I now see that the backup server has accessed the shared store(NFS) and shows the number of messages that were added to the live server's queue but from that point onwards messages only accumulate in the local queue of the server hosting the bridge.

                         

                        I did notice that the JournalType String on the live and backup server jmx-console was NIO and for the server hosting the bridge it was ASYNCIO(had enaled AIO a while back). But am assuming since the journal is not shared between bridge/live-backup this as such should not matter.

                        • 9. Re: HornetQ Bridge issues-remote server/backup-HA config
                          meenarc

                          I tried the same bridge,live/backup setup with Jboss JBoss [EAP] 5.1.2 and HornetQ Server version 2.2.10.GA . Found the same behaviour though not the same exception

                           

                          Discovery works and failover works. Tried with a client that talks directlt talks to the live/backup and the client is able to failover(though there was one exception during the kill of the live server).

                           

                          But if I use a bridge on another server, the bridge does not fail over when I use discovery. At this point messages just collect on the local queue.

                           

                          Does bridge and static-connectors and failover work in HornetQ Server version 2.2.10.GA ?

                          • 10. Re: HornetQ Bridge issues-remote server/backup-HA config
                            meenarc

                            I am being told that the combination of NIO and NFS shared store(that exists for the live/backup) could be the reason why the bridge(on the remote server which DOES NOT use the shared store and uses local storage) could lead to a "hung" bridge.

                             

                            And enabling AIO and DIRECT_IO for the NFS mount should help fix this and is apparently recommended?Is that the way I solve the bridge failover?

                             

                            Is the above a recommended approach for  a shared store on NFS in production.

                             

                             

                            For what its worth I turned on TRACE logging on the bridge?Do these logs show what's going on and whether the bridge is "hung" or not?