Just an over view of how clustered message grouping works and what the semantics are.
First of all, all nodes in the cluster need to have a grouping handler defined. This is done in the main hornetq-configuration.xml file and will either be
<grouping-handler name="my-grouping-handler"> <type>LOCAL</type> <address>jms</address> </grouping-handler>
<grouping-handler name="my-grouping-handler"> <type>REMOTE</type> <address>jms</address> <timeout>5000</timeout> </grouping-handler>
There should only be one node configured as a LOCAL handler which all the other REMOTE handlers communicate with to decide where messages should be routed to.
When a message first arrives at a node for the first time with a given group id the handler is checked to see if a particular queue has been chosen to route the message to. If not a proposal is sent and the LOCAL handler will reply with a response accepting the proposal or suggesting an alternative. From this point on all messages with this group id are sent to the same queue where the non clustered consumer grouping takes place as normal.
NB there is a configurable timeout on remote handlers and if a response is not received before the timeout, an exception is thrown and the message will not be delivered. This ensures that strict ordering is adhered to, its then up to the client to decide what to do.
If the node where the queue resides goes down then the messages will still be forwarded and stored in the storeandforwardqueue until the node reconnects.
When a binding is removed that has a group id bound to it then this will removed from all the Handlers and when a new proposal sent if a new message with this group id is sent. There is a window here where if the binding is removed by one session whilst another session is sending messages with the same group id then its possible that the messages aren't routed correctly. The client should either use the same session or change the groupid for the next set of messages.
If a message is bound to a queue whose binding has been removed then an exception is thrown and the message is not delivered. However this can only happen during the window described in the last paragraph.
All this is currently in the hornetq_grouping branch, there is an example clustered-grouping that demo's the functionality and there is also a test ClusteredGroupingTest that you can take a look at. Most of the routing code is in BindingsImpl.java.
Any questions ask, anything you think i have missed, please shout
Just confirming that <timeout> in this case is measured in milliseconds? I've noticed that some parameters are in nanoseconds in the current docs, so I didn't want to assume.