We have been experiencing an issue in our system when enabling buddy replication. The issue manifests itself in way that replication seems to be completely missing. We can turn the issue on and off by enabling/disabling buddy replication so I have focused on isolating the problem in a stand-alone test.
In my test scenario I am now able to create something I think is a bug. I am using two caches with buddy replication enabled and force data gravitation between them. I then fail the secondary cache and examine if the data was recoverable on the primary cache. This works like a charm. However, when I do this a second time around, i.e. start up a new cache again, the same scenario fails.
The second time, the objects gravitated from the primary cache are not removed as they are the first time around and when we inspect the cache for recovered data after failing the second secondary cache, we get the wrong data. The primary cache has the correct data under its _buddy_backup node but since it prefers its own data, it will read the 'wrong' version.
This is pretty complex to explain, so I have posted the complete test code here: http://www.cubeia.com/misc/buddyrep/
I've tried to add comments to the code to be as explanatory as possible. The test was written for 2.1.0.GA.
I do have an understanding of data affinity and what the concept implies, however, I believe that the test do not break or abuse data affinity but rather tests fail-over scenarios when using buddy replication. Finally, why should the behavior change because the cache had a member in the past?
I hope you can run the test and try it out. Just ask away if the code not making any sense to you =)
I have been able to recreate this. I will soon add a modified version of your test (modified to fit our test fwk) to our unit test suite.
Just as an FYI, this fails on branch 2.1.X but not on trunk (soon-to-be 2.2.0). I'll add a more formal analysis about this issue soon, but for now, see:
JBCACHE-1320, which is a feature added in 2.2.0 which probably fixed this issue.
Okay, I have traced this down and it has nothing to do with JBCACHE-1320. It has been fixed in trunk as a matter of course in refactoring some classes.
It is fixed in Branch 2.1.X as well and will be released with 2.1.1, but there is a workaround for 2.1.0.GA as well - see JBCACHE-1330.
Do let me know if the workaround solves your problem!