2 Replies Latest reply on May 8, 2012 2:44 PM by bruceas

    HornetQ Cluster Timeout Behavior

    bruceas

      What is the expected behavior when a connection between two (or more) nodes in a cluster time out (due to network problems, etc.)?

       

      I'm running the following system (HornetQ 2.1.2):

      - cluster with multiple nodes

      - no backup servers

      - publisher on one node constantly sends messages (1 per second) to consumer on another node via internal core bridge

       

      Every so often, the connection between the two nodes will time-out (due to a bad network, etc.).

      At this point, the publisher node detects a connection failure due to the timeout and closes the connection. However, messages continue to be added to the internal core bridge (i.e. the MESSAGE COUNT continues to increase).

       

      This behavior continues even when the network problems are resolved: messages continue to build up in the internal core bridge and are never delivered to the consumer (e.g. the behavior is the same as the case where the consumer node is killed and never brought back up).

       

      What is *supposed* to happen in this case? Shouldn't the producer node detect the (eventual) pings from the consumer node and either start distributing its back-log of messages to the original consumer or at least re-route them to another node?

       

      Thanks