Skip navigation

Bela Ban's Blog

5 posts
This entry actually started out with a title like "JGroups 2.5 CR-1 released". Hmm, very boring ! Scrapped that, as I really wanted to talk about the 2.5 CR-1 performance, which is GREAT!

 

I mean, don't get me wrong, 2.5 has some cool features, like the concurrent stack, out-of-band messages, better thread naming and so on. But I'll discuss those when I release 2.5 GA.

Okay, so I ran the perf.Test which ships with JGroups, so anyone can try it out, in our Atlanta lab. The hardware and software setup is described here. I ran the tests with the standard tcp.xml config file from JGroups/conf with only slight modifications, e.g. a bigger thread pool.

 

 

So far, while JGroups has been quite good at 100Mbps ethernet, and has been able to reach ~ 11MB/sec/node, it has never been able to utilize the bandwidth available to 1GB switched ethernet (ca 125MB/sec). But this changed dramatically with 2.5 and the concurrent stack. I ran only selected tests, and will discuss performance in more details later.

 

 

Look at the numbers I got:

 

 

  • 6 nodes, 5M 5K messages with tcp.xml, 500 threads/sender: 15'105 msgs/sec/node, throughput=75.52MB/sec/node
  • 6 nodes, 5M 5K messages with tcp.xml: 17'832 msgs/sec/node, throughput=89.16MB/sec
  • 8 nodes, 5M 1K messages with tcp.xml: 63'513 msgs/sec/node, throughput=63.51MB/sec
  • 8 nodes, 5M 5K messages with tcp.xml: 16'600 msgs/sec/node, throughput=83MB/sec
As an example, running 8 nodes where every node sent 5 million 5'000 byte messages, JGroups reached 16'000 messages/sec per node for a throughput of 83MB/sec per node ! In 2.4, this used to be around 30-40MB/sec/node !

 

 

This is very good news, not just for JGroups but also all the projects which run on top of it. For example, we've been measuring JBossCache and PojoCache performance, and this speedup will have a direct positive effect on them.

 

 

Okay, enough said, I will blog more once 2.5 goes GA. Enjoy !
belaban

JGroups 2.4 released

Posted by belaban Nov 1, 2006
Finally, after almost 5 months, JGroups 2.4 is here !

There are some cool features that I'll describe in more detail below. Over 80 JIRA issues were resolved in 2.4, mostly bug fixes and new functionality.
The good news is that 2.4 is API-backward compatible with all previous versions down to and including 2.2.7. So, for those folks who are using JBoss 4.0.x, which ships with JGroups 2.2.7 by default, this means that they can simply replace their JGroups JAR file with the one from 2.4 and benefit from the performance enhancements and bug fixes that went into 2.4. For details on the JBoss/JGroups combinations see http://labs.jboss.com/portal/jbosscache/compatibility/index.html.
I'll now describe the new features briefly, check out the documentation (URL below) for a full discussion.

FLUSH
-----
Flush is a feature that - whenever the group membership is to be changed, or a state to be transferred - we tell every node in the cluster to stop sending messages and then join/leave a node, or transfer the state. When done, we tell all members to resume message sending.
Why is this needed ?
In 2 cases: (1) when we use a Multiplexer (see below) and have multiple services sharing the same channel that require state transfer and (2) when we want virtual synchrony (see below).
So, if you don't use the Multiplexer or don't need virtual synchrony, you don't need the FLUSH protocol in your configuration. FLUSH is quite expensive because it uses multiple rounds of multicasts across the cluster, so remove it if you don't need it. Note that JBoss 5 requires FLUSH because it uses the Multiplexer: all cluster services share one JGroups channel.

Multiplexer
-----------
The Multiplexer was mainly developed to accommodate multiple services running on top of the *same* channel. JGroups channel are quite expensive in their use of resources (mainly threads) and sharing a channel amortizes a channel over multiple services.
This is beneficial in JBoss where we had 5 clustered services (in 4.0.x), each using its own channel. In JBoss 5, we switched to the Multiplexer, and all 5 services use the same shared channel. Startup time in JBoss 5 ('all' configuration) was 43s on my laptop before the change, and 23s afterwards !
If multiple services sharing a channel require state transfer, we run into the problems described in JGroups/doc/design/PartialStateTransfer.txt. FLUSH is required to prevent those problems.
The Multiplexer is described in chapter 6.3 of the documentation (see below).

Streaming state transfer
------------------------
So far, state has always been transferred using a byte[] buffer. This forced a user to serialize the entire state into a byte[] buffer at the state provider and unserialize the byte[] buffer into the application state at the state requester. If the state is 2GB or more, this might likely result in memory exhaustion.
Streaming state transfer uses input and output streams, so users can stream their state to the output stream in *chunks* and don't need to have the entire state in memory as required by a byte[] buffer. On the receiving side, an input stream is provided from which the user can read the entire state and set the application state to it.
Streaming state transfer is essential for large states !

Partial state transfer
----------------------
This allows a programmer to transfer a subset of the entire state, identified by an ID. We use this (via JBossCache) in JBoss HTTP session replication/buddy replication, where only the state represented by a buddy is transferred.

Virtual Synchrony
-----------------
Virtual Synchrony is a model of group communication, developed by Ken Birman at Cornell, which has the following properties:
  1. All non-faulty members see the same set of messages between views
  2. A message M sent by P in view V must be received by P in V, unless P crashes
  3. All non-faulty members receive the same sequence of views

With FLUSH, we re-implemented the old virtual synchrony implementation of JGroups (vsync.xml), which I wrote in 1998/1999, but which has never been tested rigorously. We will phase out the old implementation in 3.0, along with other reorganizations of the protocol stack packages.
Note that we have not yet *fully* implemented virtual synchrony as flushing only flushes out messages *sent* by member P, but not those *received* by P. Therefore, if member A sends a message M to {A,B,C}, and crashes immediately afterwards, and only B received M, then C will *not* receive M (violating rule #1 above). We will fully implement this in JGroups 2.5 (http://jira.jboss.com/jira/browse/JGRP-341).

View bundling
-------------
When a large number of nodes join or leave a cluster at about the same time, we can collect all JOIN/LEAVE requests and create only 1 view. View installation is quite costly, especially if FLUSH is used, which requires some round trips across the cluster, and so we minimize them.

Failure detection
-----------------
We have 2 new protocols: FD_PING and FD_ICMP which allow for scripts to be run in order to check the health of a node (FD_PING) and ICMP to ping a machine (FD_ICMP). These can of course be combined with other failure detection protocols, such as FD or FD_SOCK.

Updated documentation
---------------------
The new features have been added to the documentation at http://www.jgroups.org/javagroupsnew/docs/manual/html/index.html

JIRA issues
-----------
The issues that went into 2.4 can be found at http://jira.jboss.com/jira/browse/JGRP.

Acknowledgments
---------------
I'd like to thank Vladimir Blagojevic for writing the FLUSH and streaming/partial state transfer features, and for testing them thoroughly !

Enjoy !
Bela Ban, Red Hat Inc
Kreuzlingen, Oct 31 2006
Scott Marlow and Alex Fu from Novell have re-implemented and benchmarked the NIO-based transport for JGroups. NIO provides the ability to essentially have the equivalent of C's select() in Java. What is the advantage ?

 

In TCP, we have one thread per connection, whereas in TCP_NIO we have one thread pool handling all connections. This is supposed to scale much better than JGroups' TCP transport: consider a cluster of N. For each node sending to the cluster, we have a mesh, with everyone connecting to everyone else (except itself). Even when assuming that idle connections are closed by the connection pool, we still have many connections which need threads to handle them. Scott et al have measured the performance of a 4-node cluster where everyone sends messages to the cluster, and came up with excellent numbers. In a nutshell, they got ~ 13000 1K messages/second on a 100Mbps ethernet, but read up on the details for yourselves at http://www.jgroups.org/javagroupsnew/docs/Perftest.html

I like open source. This is a little example of how AUTH was conceived.

 

 

JGroups has always been pretty weak on the security front, so there wasn't any encryption until about 2 years ago (ENCRYPT), and any member could join a group and wreak havoc. To be quite honest, I never cared much about security (I think it's pretty boring), and in most cases JGroups is used behind a firewall anyway, which is generally considered a safe zone.

 

 

However, with more and more users using JGroups in a WAN setting, this meant that (unless you used VPNs or secure IP tunnels), anybody could eavesdrop on JGroups traffic, and even worse: anybody with a matching stack could join the group and start receiving and sending messages !

 

 

So about 2 years ago, Steve Woodcock, a smart bloke from London wrote ENCRYPT, or rather rewrote it (the first version was pretty bad). So we had encryption of messages... but still anybody could join the group.

 

Over the last couple of months I've heard more and more requests for a feature that allowed only authorized members into the group. So naturally that feature was discussed on the JGroups dev list. However, often times a lot of people add their 2 cents, but nobody is willing to actually implement it :-) Then, this summer I went to do a JGroups consulting at a large insurance company in Switzerland, and we somehow got into the topic of authentication. The discussion got pretty detailed (I think we had time left at the end of the 2 days), and we came up with a pretty detailed description of AUTH). They, however, didn't have the time or resources to implement it right away.

 

Enter Chris Mills, who attended my Clustering training in Washington D.C. in October. In the JGroups presentation I mentioned AUTH, and he started hacking away at it. At the end of the training, he had a prototype ready, and last week during my Clustering training in London, he presented a quite complete version of it. I think he started writing it because he wanted to see how a JGroups protocol works, and (maybe unexpectedly?) ended up with a full blown protocol. Great job Chris ! We will see this being integrated into the 2.3 head version of JGroups early next year. This is how it works:

  • AUTH is just below GMS and intercepts JOIN_REQ requests
  • On the client side, AUTH adds some credentials (e.g. a X.509 cert or a password) to the JOIN message
  • On the server side (coordinator), AUTH extracts the credentials and verifies it. If true, it passes the request up to GMS, otherwise it sends back a JOIN_RSP, which will causes an exception at the client on Channel.connect().
The providing of credentials (client side) and the validation of them (server side) are exposed as policies, so anybody can implement their own. Chris has a X.509 policy and a simple oone-way hased password policy. This is only a small example of how open source works, but it shows that there can be many stakeholders involved, from wishlist creating to requirements gathering to specification and finally implementation...
belaban

Stressing JGroups

Posted by belaban Dec 13, 2005
My first blog entry... I never thought I'd sink that low, but anways, here goes... :-)

 

I recently ran a simple performance test on my laptop (a Windows XP Dell D-810 with 2GB of memory and a single processor). I had 2 processes on the same box, each process multicasting 500 million 1K messages, for a total of 1 billion 1K messages, so 1TB of data exchanged between the 2 processes. The switch used was a simple $20 100Mbps switch. The setup (and how to run the tests yourselves) is described here.

 

Having the 2 processes on the same machine probably (negatively) affected the test results a bit, but my main goal was really to see whether I had any memory leaks.

Anyway the results I got were promising: the time taken was roughly 22 hours, with an average message rate of ca 10300 messages per second. The standard deviation was actually very small, and the memory used was roughly 40MB (I used -Xmx100M, which is relatively small and used a conservative (a.k.a. non aggressive flow control). Attaching jconsole to it showed that free memory stayed flat during the entire run, so I'm confident that I could run the test for a couple of billion messages and still get stable memory... The one problem with jconsole though was that it ran out of memory after ca. 600 million messages but simply restarting it helped - and this didn't affect the running test. I suspect jconsole must be storing some information in memory, and accumulating that info triggers an OOM. Guess it wasn't meant to be used for such large scale tests. I actually had started a run of 2 members where every member sent 1 billion 1K messages, for a total of 2 billion messages. But then my little daughter closed the lid on my laptop after 800 million messages whiel I was away... I guess one of the perils of working at home :-)

 

Stay tuned for more detailed performance numbers; we will have 8 new boxes in our Atlanta lab in January and I should have some numbers by the end of February if everything goes according to plan. Also, the Novell guys working on TCP_NIO are working on an article on performance, which they measured on a 4 node cluster, and their numbers are excellent (roughly 13000 1K messages/sec). Stay tuned !