RPCsAndViewChanges

Synchronous RPCs and view changes

 

When cluster-wide RPCs are invoke synchronously (ie., the caller is blocked until everyone has replied), there are issues with view changes.

 

A simple example is cluster V2={A,B,C,D}.

 

Let's say D crashes and - at the same time - A invokes a cluster RPC.

 

If A received V3={A,B,C}, then it would wait for replies from itself, B and C. However, if A invoked the RPC first and only after returning received V3, then it would wait for D's reply forever.

 

There are a few things that can be done to remedy this:

  • Bound the RPC with a timeout, say 5000. In this case, if the RPC is executed first and only then V3 is received, the RPC would return after 5000 with valid return values for A, B and C, and a null return value for D, which is marked as 'not-received'

  • Use asynchronous RPCs if you can. Sometimes, though, especially for data collection tasks, this won't do