Buddy Replication in JBoss Cache (JBCACHE-61)| JBoss.org Content Archive (Read Only)

15. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

manik Mar 13, 2006 11:44 AM (in response to manik)

colocatedServerList is also for when you run 2 instances on 1 physical machine, but bind each instance to a different IP. Analysing the contents of a JGroups View, these will seem like 2 different hosts, and we'd have no idea that they in fact reside on the same machine.

And yes, it can also be used in the scenario you mentioned where you want 2 physical servers connected to the same power source to be treated as colocated.

And again, yes, you're right, the simplest case does check the IP addresses of the members in a group and 'guesses' if instances are colocated as well.

16. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

manik Mar 13, 2006 11:48 AM (in response to manik)

Re: JGroups ports, I suppose this can be ignored if not specified - but then we would have to deal with some default behaviour if in fact there are 2 instances on the same IP address (even if this is unlikely). Probably just pick the first one and document the behaviour as 'indeterminate' ...

17. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

brian.stansberry Mar 13, 2006 11:52 AM (in response to manik)

"manik.surtani@jboss.com" wrote:
colocatedServerList is also for when you run 2 instances on 1 physical machine, but bind each instance to a different IP. Analysing the contents of a JGroups View, these will seem like 2 different hosts, and we'd have no idea that they in fact reside on the same machine.

I'm too lazy to look if there is some kind of gotcha, but it seems like we should be able to be aware of all the IP addresses associated with our machine. I know that JGroups UDP walks through all the interfaces/addresses when it binds the multicast socket to all addresses.

18. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

brian.stansberry Mar 13, 2006 11:55 AM (in response to manik)

And log a WARN (or even an ERROR), which should be enough to alert an admin that they have a faulty configuration.

19. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

manik Mar 13, 2006 6:03 PM (in response to manik)

re: colocatedServerList, thanks for the tip, Brian. Using java.net.NetworkInterface.getNetworkInterfaces() I can easily walk through the collection of interfaces on a single host. Probably do this once when instantiating the BuddyLocator impl.

So I don't see a need for a colocatedServerList - except for the scenario you mentioned where separate hosts ought to be considered as colocated because they are connected to the same power source, etc.

Or also, the case of running virtualisation software to run multiple OS instances (each with its own virtual NIC), each with a cluster member.

Are the above two use cases common enough to warrant a colocatedServerList?

20. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

brian.stansberry Mar 13, 2006 9:44 PM (in response to manik)

Maybe for those kinds of situations they should use SpecificBuddyLocator. This seems to be more the way WL does it:

http://edocs.bea.com/wls/docs70/cluster/failover.html#1022145

21. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

ghinkle Mar 14, 2006 12:33 AM (in response to manik)

Applogies if this stuff has already been discussed, but I've only just seen the blog entry referencing these designs.

First, I think there is a difference between SpecificBuddyLocator and ReplicationGroups. In ReplicationGroups, I just have to give each node a name and where the names match they're considered grouped. I'd really rather not have to go update many configurations every time I add or remove a machine in my cluster. If I use name-based resolution, I have the flexibility to replace failures and add additional hardware when load demands it without a shutdown-reconfigure.

Second, I rather like the idea of configuring a machine name for each node as the way it determines colocation. This lets me choose if I want it to work on one host OS or virtualized OSes on one box.

In this way, for each node, I just need to configure a clustername, a cluster group, a replication group and a machine name. Plus those things wouldn't need to change as I reconfigure/add/remove nodes.

The other thing I was thinking is that I'm pretty sure I wouldn't care if cross-session object links were maintained. I wouldn't want them to be there in the first place... so I'm not convinced data slicing (at least per session) is a bad idea. The benefit of having a secondary for each session is that when the primary fails, I can have it so the rest of the machines in the cluster share in the work of getting back to steady-state.

In bigger clusters, the move back to steady state is more damaging to the cluster than the outage. I've seen clusters trip domino style due to the replication failover causing extra load / memory usage on the secondary. If I've got 512 mb of session per node and I get a failure like your scenario in the wiki, node B and C now have session memory requirements of 1.5 gb up from 1 gb pre-failure. Of course my experience is more from a time when GC was a bigger problem than it is today, but I'd still worry about quick jumps in memory and high traffic between specific nodes.

22. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

chrismills Mar 14, 2006 1:59 AM (in response to manik)

With regard to ports...

You could use AUTH in JGroups to validate users joining the group - the AUTH protocol can do whatever checking you like to see if the node should be allowed to join or not...e.g against a predefined list of IP (& ports if needed).

If BuddyReplication was switched on you could even stop them running two instances on the same IP if you wanted.

Thoughts?

23. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

manik Mar 16, 2006 7:55 AM (in response to manik)

"chris.mills@jboss.com" wrote:
With regard to ports...

You could use AUTH in JGroups to validate users joining the group - the AUTH protocol can do whatever checking you like to see if the node should be allowed to join or not...e.g against a predefined list of IP (& ports if needed).

If BuddyReplication was switched on you could even stop them running two instances on the same IP if you wanted.

Thoughts?

I have no real problems with people running several instances on a single IP - I don't necessarily want to stop them from doing so. I just want to ensure we don't put 2 instances on the same physical server into the same buddy group.

24. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

manik Mar 16, 2006 8:03 AM (in response to manik)

"ghinkle" wrote:
First, I think there is a difference between SpecificBuddyLocator and ReplicationGroups. In ReplicationGroups, I just have to give each node a name and where the names match they're considered grouped. I'd really rather not have to go update many configurations every time I add or remove a machine in my cluster. If I use name-based resolution, I have the flexibility to replace failures and add additional hardware when load demands it without a shutdown-reconfigure.

Would this not involve additional RPC, where a node would have to ask all nodes in the cluster which Replication Group they're in to find its replication group members?

"ghinkle" wrote:

Second, I rather like the idea of configuring a machine name for each node as the way it determines colocation. This lets me choose if I want it to work on one host OS or virtualized OSes on one box.

You can do this anyway without naming nodes - each instance will get a unique JGroups address anyway which acts as a name.

"ghinkle" wrote:

The other thing I was thinking is that I'm pretty sure I wouldn't care if cross-session object links were maintained. I wouldn't want them to be there in the first place... so I'm not convinced data slicing (at least per session) is a bad idea. The benefit of having a secondary for each session is that when the primary fails, I can have it so the rest of the machines in the cluster share in the work of getting back to steady-state.

In bigger clusters, the move back to steady state is more damaging to the cluster than the outage. I've seen clusters trip domino style due to the replication failover causing extra load / memory usage on the secondary. If I've got 512 mb of session per node and I get a failure like your scenario in the wiki, node B and C now have session memory requirements of 1.5 gb up from 1 gb pre-failure. Of course my experience is more from a time when GC was a bigger problem than it is today, but I'd still worry about quick jumps in memory and high traffic between specific nodes.

My problem with slicing data and letting the entire network help with distributing it upfront is that we have no knowledge of what is in the cache to be able to slice/partition it without losing relationships, etc. Especially with TreeCacheAop (FIELD level session replication) we may have shared references to objects which then break down if you try and split up the sessions held.

I agree though that the load spike for the buddy (and buddy's buddy) when a failure occurs is quite high - and hence the need for data gravitation and a load balancer that now starts to redirect requests evenly across a cluster. Gradually this extra load should spread out.

25. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

ghinkle Mar 16, 2006 9:15 AM (in response to manik)

"manik.surtani@jboss.com" wrote:
"ghinkle" wrote:
First, I think there is a difference between SpecificBuddyLocator and ReplicationGroups. In ReplicationGroups, I just have to give each node a name and where the names match they're considered grouped. I'd really rather not have to go update many configurations every time I add or remove a machine in my cluster. If I use name-based resolution, I have the flexibility to replace failures and add additional hardware when load demands it without a shutdown-reconfigure.

Would this not involve additional RPC, where a node would have to ask all nodes in the cluster which Replication Group they're in to find its replication group members?

It would absolutely require more info, but if I can do it by having a cluster view of replication group membership it is absolutely worth it. It would be part of the metadata a node would push out when it joins the cluster. Everyone else is doing this by having heartbeats that share this information. Each node in the cluster does its best to keep its perspective of who's in the cluster and what their information is (replication group). Let's put it this way, its a little more code for a lot more maintainability.

"manik.surtani@jboss.com" wrote:

"ghinkle" wrote:

Second, I rather like the idea of configuring a machine name for each node as the way it determines colocation. This lets me choose if I want it to work on one host OS or virtualized OSes on one box.

You can do this anyway without naming nodes - each instance will get a unique JGroups address anyway which acts as a name.

No, this doesn't tell me that two nodes are running on different virtualized operating systems on the same hardware. That type of information would have to be configured by the user. A certain other product used to try and automatically figure this out, but now, due to virtualization, just make you configure it. Once again, the critical difference is saying for a node "I'm machine A", rather than "I'm on the same physical hardware as nodes x.x.x.x and x.x.x.y". I don't want to go update the other three nodes on that virtualized hardware when I decide to add a fourth.

"manik.surtani@jboss.com" wrote:

"ghinkle" wrote:

The other thing I was thinking is that I'm pretty sure I wouldn't care if cross-session object links were maintained. I wouldn't want them to be there in the first place... so I'm not convinced data slicing (at least per session) is a bad idea. The benefit of having a secondary for each session is that when the primary fails, I can have it so the rest of the machines in the cluster share in the work of getting back to steady-state.

In bigger clusters, the move back to steady state is more damaging to the cluster than the outage. I've seen clusters trip domino style due to the replication failover causing extra load / memory usage on the secondary. If I've got 512 mb of session per node and I get a failure like your scenario in the wiki, node B and C now have session memory requirements of 1.5 gb up from 1 gb pre-failure. Of course my experience is more from a time when GC was a bigger problem than it is today, but I'd still worry about quick jumps in memory and high traffic between specific nodes.

My problem with slicing data and letting the entire network help with distributing it upfront is that we have no knowledge of what is in the cache to be able to slice/partition it without losing relationships, etc. Especially with TreeCacheAop (FIELD level session replication) we may have shared references to objects which then break down if you try and split up the sessions held.

I agree though that the load spike for the buddy (and buddy's buddy) when a failure occurs is quite high - and hence the need for data gravitation and a load balancer that now starts to redirect requests evenly across a cluster. Gradually this extra load should spread out.

I think I disagree with the idea of trying to make this a generic cache and a proper http session cache. To be honest, I couldn't care less about having a generic buddy cache if it doesn't work well for http session replication. And I believe that there would be a lot more people interested in using it for http session replication than as a genric cache. We need to support the things customers really want to do... and right now, they're clamoring for more scalable, more easily configured clustering. You can make it much more easy to configure and maintain with the above changes and you can make it more scalable, well likely, by not strictly having one buddy take the full brunt of another node going down. (Though the best way to see these impacts is to have a big cluster, working under load and start pulling out the network cables.) Just hearing about the mechanisms though, I worry about triggering the types of chaining failures that I've seen before.

26. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

manik Mar 24, 2006 2:04 PM (in response to manik)

Ok, the more I think about it the more I tend to agree that using a repication group concept as a top-level construct (above a buddy locator) is probably a good idea. Basically, when you configure buddy replication, you could have the following config XML block:

<attribute name="BuddyReplicationConfig">
 <config>
 <buddyLocatorClass>org.jboss.cache.cluster.NextMemberBuddyLocator</buddyLocatorClass>
 <buddyLocatorProperties>numBuddies = 3</buddyLocatorProperties>
 <replicationGroup>MyGroup</replicationGroup>
 <config>
 </attribute>

This would warrant an additional 'handshake' message broadcast to all servers in teh cluster upon view change where each node reveals its replication group.

If left blank, the replication group used is an internal default that represents the entire cluster.

This way, using the NextMemberBuddyLocator is all that is needed - a SpecificBuddyLocator would be both redundant and unnecessary.

What do people think? I myself was not too happy with the extra message broadcast to the cluster by each node on view change, but Greg's arguments above make sense and I agree that this is probably the cleanest way to define subgroups within a cluster.

27. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

manik Apr 4, 2006 8:07 AM (in response to manik)

Ok, if there are no more comments in this, I'll update the designs to reflect the replication group and necessary handshakes.

28. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

manik Apr 4, 2006 9:43 AM (in response to manik)

I've updated the designs on the wiki to allow for the replicationGroup concept (or buddyPoolName as I've called them)

Please have a look and let me know what you think - some bits I'm not the happiest about are:

* 3 new remote methods instead of 2
* Additional 'handshaking' every time a view change occurs

Cheers,
Manik

29. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

brian.stansberry Apr 6, 2006 12:54 AM (in response to manik)

During the period when a data owner has died, but before the primary buddy has taken over the data, how is the data accessible? I would expect the ClusteredCacheLoader get() call could only find the data on the DataOwner node, as it's stored in a different location elsewhere.

Also, is evictOnFind replicated? If not, as data gravitates it will get left behind on the buddy nodes.