JGroupsFC

Version 12

Created by belaban on Jan 17, 2005 2:20 PM. Last modified by brian.stansberry on Oct 28, 2010 3:39 PM.

Definition

Credit based flow control protocol. When senders send faster than receivers can process the messages, over time the receivers will run out of memory. Therefore the senders have to be throttled to the fastest rate at which all receivers can process messages.

FC uses credits to do so: a sender has N credits for all members (including itself). When it sends a message, it decrements the number of bytes sent from the receiver (if sent to the group: from all receivers). When one receiver's credits are smaller than what is required to send a message, the sender blocks and waits until the receiver(s) send more credits.

The receiver also decrements the number of bytes received, and send credits back to the sender when the credits for that sender fall below a certain threshold, defined by min_credits (or min_threshold).

Configuration Example

<FC max_credits="2000000" down_thread="false" min_threshold="0.10"></FC>

Configuration Parameters

Name	Description
id	Give the protocol a different ID if needed so we can have multiple instances of it in the same stack
level	Sets the logger level (see javadocs)
max_block_time	Max time (in milliseconds) to block. Default is 5000 msec
max_credits	Max number of bytes to send per receiver until an ack must be received to proceed. Default is 2000000 bytes
name	Give the protocol a different name if needed so we can have multiple instances of it in the same stack
stats	Determines whether to collect statistics (and expose them via JMX). Default is true

Why is FC needed on top of TCP ? TCP has its own flow control !

The reason is group communication, where we essentially have to send group messages at the highest speed the slowest receiver can keep up with.

Let's say we have a cluster {A,B,C,D}. D is slow (maybe overloaded), the rest is fast. When A sends a group message, it establishes TCP connections A-A (conceptually), A-B, A-C and A-D (if they don't yet exist). So let's say A sends 100 million messages to the cluster. Because TCP's flow control only applies to A-B, A-C and A-D, but not to A-{B,C,D}, where {B,C,D} is the group, it is possible that A, B and C receive the 100M, but D only received 1M messages. (BTW: this is also the reason why we need NAKACK, although TCP does its own retransmission).

Now JGroups has to buffer all messages in memory for the case when the original sender S dies and a node asks for retransmission of a message of S. Because all members buffer all messages they received, they need to purge stable messages (= messages seen by everyone) every now and then. This is done by the STABLE protocol, which can be configured to run the stability protocol round time based (e.g. every 50s) or size based (whenever 400K data has been received).

In the above case, the slow node D will prevent the group from purging messages above 1M, so every member will buffer 99M messages This in most cases leads to OOM exceptions. Note that - although the sliding window protocol in TCP will cause writes to block if the window is full - we assume in the above case that this is still much faster for A-B and A-C than for A-D.

So, in summary, we need to send messages at a rate the slowest receiver (D) can handle.

So do I always need FC?

This depends on how the application uses the JGroups channel. Referring to the example above, if there was something about the application that would naturally cause A to slow down its rate of sending because D wasn't keeping up, then FC would not be needed.

A good example of such an application is one that makes synchronous group RPC calls. By synchronous, we mean the thread that makes the call blocks waiting for responses from all the members of the group. In that kind of application, the threads on A that are making calls would block waiting for responses from D, thus naturally slowing the overall rate of calls. As long as the maximum number of sending threads the application allows multiplied by the amount of data each thread will send in one message fits within the memory budget of each server, use of FC is not required.

Note that the "maximum number of sending threads the application allows" mentioned in the previous paragraph is a cluster-wide figure. For example, if servers A, B, C and D in the example above each had an application thread pool of 100 threads that could handle user requests and send JGroups messages, then each server would need to have enough memory to store 400 messages -- 100 of its own and 100 from each of its 3 peers.

A JBoss Cache cluster configured for REPL_SYNC is a good example of an application that makes synchronous group RPC calls. If a channel is only used for a cache configured for REPL_SYNC, we recommend you consider removing FC from its protocol stack.

And, of course, if your cluster only consists of two nodes, including FC in a TCP-based protocol stack is unnecessary. There is no group beyond the single peer-to-peer relationship, and TCP's internal flow control will handle that just fine.

Another case where FC may not be needed is for a channel used by a JBoss Cache configured for buddy replication and a single buddy. Such a channel will in many respects act like a two node cluster, where messages are only exchanged with one other node, the buddy. (There may be other messages related to data gravitation that go to all members, but in a properly engineered buddy replication use case these should be infrequent. But if you remove FC be sure to load test your application.)

Back To JGroups

JBossDeveloper