Why does RehashTask.invalidateInvalidHolders block until all responses have been received?
for (Future f : futures) f.get();
My first thought was that this might be for consistency reasons. However, in the event that one or more of these requests results in an exception the join still completes. In addtion, while the node may enter a FAILED state it still remains in the cluster topology.
Perhaps this line can be removed. Perhaps a timeout can be added to the get call, and the resulting timeout exception caught and logged.
In our testing, the invalidation process can take longer than 15s if there are tens of thousands of entries to be invalidated. We're still trying to determine why it takes so long. In any event...
Here is why I am looking at this.
- Nodes 1, 2, and 3 started.
- Node 4 starts and attempts to join.
- Node 4 sends an invalidation request to Node 1.
- Node 4 throws a timeout exception after 15s.
- Node 4 finishes the join.
- Node 4 enters a FAILED state.
There appear to be 2 problems here.
One is obviously that the timeout is causing this node to enter a FAILED state. The other is that even though this node is in a FAILED state, it is still a part of the cluster topology.
Any node added in the future that includes Node 4 in a rehash will subsequently fail.
Just to add to what Shane has indicated.
I think the cache entries are migrated to this new node (failed node) - really depends upon the point of failure (timeout). If it is during invalidation request processing then the Node has already applied the state. The node which is invalidating keeps going it seems and finishes.
As a result, we may be in a state where we have a failed Cache node with valid entries not accessible to the rest of the cluster.
Good call Kapil.
This leads me to believe that perhaps the join process shoud be as follows.
- Send State Request
- Send Update Topology Request
- Send Invalidation Request
If the state request (rehashing) fails, then the topology should not be updated and the cluster should continue to function properly.
If the topology request fails, the cluster should continue to function properly.
If the invalidation request fails, there may be issues. I'm wondering if an invalidation request is actually necessary. Can it not be a part of the update topology request such that rather than sending a list of entries to invalidate, each node determines what entries to invalidate based on the topology change?
That will be helpful. Do you have any thoughts regarding the join process as a whole? We actually have two issues with the join process: invalidation & state transfer. Either one can timeout and fail. This will certainly eliminate the timeout problem with the invalidation. However, both issues are a result of updating the topology before completing the rehash. If either of these fail, the node enters a FAILED state. However, the join still completes 'successfully' and the node remains in the topology.
Perhaps the simplest thing to do would be to remove the node from the topology if either of these two tasks fail?