Core Bridge hangs when network disconnected during message send
jessedaniels1 Mar 9, 2010 8:49 PMFollowing the bug reporting instructions, I believe this issue is related to these JIRAs:
https://jira.jboss.org/jira/browse/HORNETQ-216
https://jira.jboss.org/jira/browse/HORNETQ-47
Our environment essentially consists of two nodes (A and B) that are connected with an unreliable, low bandwidth, high latency, error-prone network (e.g., wireless or satellite communications). We have established a core bridge on node A as a way to reliably forward messages across this network to node B. I believe we have set all of the relevant settings so the bridge should re-connect:
<bridges>
<bridge name="source-to-dest-bridge">
<queue-name>jms.queue.BridgeSendQueue</queue-name>
<forwarding-address>jms.queue.BridgeReceiveQueue</forwarding-address>
<retry-interval>2000</retry-interval>
<retry-interval-multiplier>1.0</retry-interval-multiplier>
<reconnect-attempts>-1</reconnect-attempts>
<failover-on-server-shutdown>false</failover-on-server-shutdown>
<use-duplicate-detection>true</use-duplicate-detection>
<confirmation-window-size>10000000</confirmation-window-size>
<connector-ref connector-name="netty-destination-connector"/>
</bridge>
</bridges>
If the bridge is idle it will reconnect after a network disconnect. However, if the network is disconnected (for various lengths of time) then reconnected while sending a message across the two nodes the bridge hangs and will not resume sending messages. Server A detects the connection failure:
[Thread-0 (group:HornetQ-client-global-threads-2089015486)] 09:38:24,844 WARNING [org.hornetq.core.remoting.impl.RemotingConnectionImpl] Connection failure has been detected: Did not receive data from server for org.hornetq.integration.transports.netty.NettyConnection@279977bd[local= /172.16.1.93:41790, remote=/172.16.2.129:5445] [code=3]
The bridge will not resume sending once the network is re-established. If HornetQ on server A is restarted, the bridge will re-connect and resume sending messages. Note that in attempting to shut down server A these messages appear:
[HornetQ Server Shutdown Timer] 10:07:44,571 WARNING [org.hornetq.core.server.cluster.impl.BridgeImpl] Timed out waiting to stop
And
[HornetQ Server Shutdown Timer] 10:08:16,406 WARNING [org.hornetq.core.server.impl.HornetQServerImpl] Timed out waiting for pool to terminate
Based on the documentation I was expecting the bridge to detect that the send operation has failed and attempt to re-connect.