I'm new to this forum and firstly I want to say "Hi" to all, :-)
I have a question about JBoss AS 5.1: How JBoss AS 5.1 cluster does the merge and state transfer after a node re-join (this node is disconnect because of network off and then re-join when network is on again)?
After searching in this forum and google for days, I just find something:
1. In JBoss 5.1 cluster, JGroups will handle cluster partitions when network on-off. JGroups will auto detect cluster view and merge views after network is on again.
+ But how does JGroups determine which sub-cluster is the primary partition or which node is the coordinator?
The partition which has more nodes? or the oldest node?
I also see that the node that "the coordinator is the member who has been up the longest." (https://community.jboss.org/wiki/JGroupsMERGE2)
But when I test with cluster includes 2 machines, after merge view the coordinator is not the machine which is up longer.
+ Is there any configuration in JBoss AS 5.1 about JGroups rules to select the coordinator?
2. After merge view finished, state transfer will process. How does Jboss-cache replicate data between caches in each node?
Is it true that cache in nodes (not in primary partition) will be flush and then be copied from the cache in primary partition?
I also find some discussions about: evict cache, evict cluster state... but how to config this in JBoss AS 5.1?
3. When state transfer happens after merge view, can we detect when state transfer finished? (to reload data from cache)
I use CacheListener and check for CacheUnblockedEvent with IsPre() == NO (this is notified after state trasfer finished) but this event may be notified more than 1 time and I can not determine which node must be reloaded. (only need to reload not up-to-date node)
About my test:
- Jboss AS 5.1, windows 7, 2 nodes: A (192.168.1.101) and B (192.168.1.110).
- Apache mod_jk on node A to be load balancer
- Cache is created by DefaultCacheFactory with: IsolationLevel.READ_COMMITTED, CacheMode.REPL_SYNC
+ Start node A
+ Start node B=> form a cluster 2 nodes
+ call service through apache on node A=> cache on all node will be updated
+ Turn off network on node A
+ call service through apache on node A=> cache on node A will be updated
+ Turn on network on node A => cluster auto detect and merge to a cluster with 2 nodes: A and B.
+ After receive new view and state transfer finished, cache data is the same as cache in node B (which is not updated and is not oldest node)
(log files is attached)
Would you give me some suggestions?
To explain more about my test cluster with 2 nodes(A,B): I disable network by un-pluging network cable of one machine.
There are a case that after network down and up again, cluster reiceive new view includes 2 nodes (A,B) but there is no UnblockCacheEvent (it seems to be that there is no state transfer) => so after merge view data cache in node A difference from data cache in node B.
Any one gives me some comments?
Currently, I use Jboss 5.1 GA and don't wnat to upgrade to Jboss 6.0 or later.
AFAIK the view merge is more or less random if JGroups detect a merge one of the coordinator nodes will be the new coordinator.
There is no configuration to set a selection for that.
The JBoss cache might be in a inconsistent state. It means if a value is changed in both clusters it will be different until the next update.
hope that helps a bit.
Hi Wolf-Dieter Fink,
It means that JGoups has no rules to select the coordinator, e.x: last updated, longest life-time...
I found a question that talk a bout JGroups views installing: http://sourceforge.net/projects/javagroups/forums/forum/130427/topic/3915996 : when merging lexical sorting of all involved addresses is used to determine the new coordinator.
And in Jboss 5.1 Clustering Guide (10.1.9 State transfer)says that: "The state transfer service requests the application state (serialized as a byte array) from an existing node (i.e., the cluster coordinator) and transfer it to a newly joining node."
So in case of merging sub-cluster, the JBoss cache minght be inconsistent. Is this an issue of JBoss 5.1 GA? and how do I work around this issue or I must upgrade to new version of JBoss (6.0 or later)?
Yes, JGroups uses lexical sorting of eligible candidates for coordinatorship. This is needed to deterministically pick the new coordinator without any election protocol having to be run.
In case of a merge, if the same state was modified in different partitions, neither JBossCache nor Infinispan (REPL) merge the substates back into one; this is left to the user.  describes various mechanisms to do this. With Infinispan's DIST mode, state is rebalanced after a merge, and when we have eventual consistency, consistency is established by merging the change history. If there are conflicts, a user has to synthesize the correct state from the change history given to him.
In my test with 2 node cluster, when cluster is split into 2 sub-cluster I call service to modify cache in 1 sub-cluster. After cluster merging when network is up again, there is case that all nodes have the same data cache. Does it means Jboss-cache does the state transfer to make all nodes have the same data cache? (e.x.: flush caches of nodes in non-primary partition and copy state from the coordinator node to all nodes in non-primary partition).
How can I do substate merging in my application?
Arcording to http://www.jgroups.org/manual-3.x/html/user-advanced.html#HandlingNetworkPartitions, I need to create a custom ExtendedReceiverAdapter and set it for JGroups jchannel which is using in JBoss to handle viewAccepted(). Could you give me some guidance to do this? (e.x: where can I get JChannel?, how can I add state info: counter or time stamps to determine which state is selected as up-to-date state?)
Thanks a lot.
As far as I know neither JBossCache nor Infinispan do any sort of merging, so the data you mention should be different in the 2 nodes after a merge.
Regarding getting view callbacks, there's an annotation that let's you get a View or MergeView in Infinispan. Forgot what it's called, but check the Infinispan manual for details or ask the Infinispan folks.