1 2 3 4 Previous Next 51 Replies Latest reply on May 23, 2006 11:23 AM by manik Go to original post
      • 30. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
        brian.stansberry

        Also, I think it is good to walk through the normal server shutdown process to see how it impacts things.

        1) Tomcat connector stops (actually this is currently #2, but that's a bug I think is being fixed). Therefore other cluster nodes will start getting failover requests and data gravitation will begin. So, we need to ensure data gravitation works well if done concurrently with shutdown and takeover of the data by the primary.

        2) Webapps start being undeployed. As apps undeploy, they call TreeCache.inactivateRegion(), which evicts the region. Oops! Who's going to respond to the data gravitation requests? Seems like inactivateRegion() should trigger the primary buddy to take over the region's data before evicting.

        3) Cache stops, view change occurs, primary takes over as data owner, servers for whom stopped server was a buddy pick a new buddy.

        • 31. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
          manik

          Yes, there will be a lag between the time when the primary buddy detects that the data owner dies (receives a view change to this effect) and takes ownership of the data owner's data. During this limbo time clustered get requests will not find this data.

          evictOnFind is not replicated to the cluster. It is a parameter passed in to the clustered get call, so when a node responds to a clustered get request, it will evict the data it sends across.

          • 32. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
            manik

            Re: shutdown sequence, very good point.

            I think we're looking at another remote call method where a data owner can contact its buddy group and initiate takeover of data - the same process a buddy group would go through when it detects that the data owner dies.

            This is probably something that will need to be tied into the cache shutdown hook.

            • 33. Re: Buddy Replication in JBoss Cache (JBCACHE-61)

              Regarding Brian's #2, I think hot-deployment is an exceptional case though. I mean if you un-deploy your application during peak load, this is asking trouble. And if it is off peak, then is 404 or 500 error acceptable during re-deployment?

              • 34. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                brian.stansberry

                 

                Yes, there will be a lag between the time when the primary buddy detects that the data owner dies (receives a view change to this effect) and takes ownership of the data owner's data. During this limbo time clustered get requests will not find this data.


                IMHO this is a fundamental flaw. The purpose of replicating data is to ensure it is available. Looks like we now have a setup where it's not necessarily available, just recoverable.

                There can be very long time lapses between when a server fails and JGroups detects this. 15 secs for default FD config, 2 hours + for some failure scenarios with FD_SOCK, 2 mins for those cases if FD is combined with FD_SOCK.

                Perhaps a scheme whereby the replication message was applied to the regular tree on the primary buddy, and to the _buddy_backup_ tree on other buddies. (I imagine that's a bad idea, or the _buddy_backup_ tree wouldn't exist in the first place). Or perhaps if a get() is received with the "checkBuddyTree" option a cache checks it's main tree and then checks the buddy tree if not found.

                evictOnFind is not replicated to the cluster. It is a parameter passed in to the clustered get call, so when a node responds to a clustered get request, it will evict the data it sends across.


                Sorry, I was imprecise in my wording :( Will we have the problem I described, i.e. when data is evicted from the data owner as part of gravitation, the data owner's buddy's don't know this and thus end up with stale data?

                • 35. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                  brian.stansberry

                   

                  "manik.surtani@jboss.com" wrote:
                  Re: shutdown sequence, very good point.

                  I think we're looking at another remote call method where a data owner can contact its buddy group and initiate takeover of data - the same process a buddy group would go through when it detects that the data owner dies.

                  This is probably something that will need to be tied into the cache shutdown hook.


                  It should take an Fqn parameter, so it can also be called from inactivateRegion().

                  The key thing with activate/inactivateRegion is their relationship to marshalling using a component's classloader. Once a region is inactivated, the cache can no longer act as a buddy for that region, as it will not have the classloader needed to unmarshal replication messages.

                  Hmm -- this is a can of worms! These buddy groups really need to be managed at the region level.

                  Just opened http://jira.jboss.com/jira/browse/JBCACHE-550, which is for a relatively small detail about this larger issue of handling regions/marshalling.

                  • 36. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                    manik

                     

                    "bstansberry@jboss.com" wrote:
                    Yes, there will be a lag between the time when the primary buddy detects that the data owner dies (receives a view change to this effect) and takes ownership of the data owner's data. During this limbo time clustered get requests will not find this data.


                    IMHO this is a fundamental flaw. The purpose of replicating data is to ensure it is available. Looks like we now have a setup where it's not necessarily available, just recoverable.

                    There can be very long time lapses between when a server fails and JGroups detects this. 15 secs for default FD config, 2 hours + for some failure scenarios with FD_SOCK, 2 mins for those cases if FD is combined with FD_SOCK.

                    Perhaps a scheme whereby the replication message was applied to the regular tree on the primary buddy, and to the _buddy_backup_ tree on other buddies. (I imagine that's a bad idea, or the _buddy_backup_ tree wouldn't exist in the first place). Or perhaps if a get() is received with the "checkBuddyTree" option a cache checks it's main tree and then checks the buddy tree if not found.


                    Yeah, putting this stuff in the regular tree will mean that it gets mixed up with regular data and gets backed up as regular data as well. All sorts of issues, plus you don't have the clean, "treat all buddies as equal" thing.

                    Perhaps a get() with a checkBuddyTree option. Hmm. Which buddy tree do we check? An instance could be a buddy to more than one data owner. And then if a request makes changes to the backup, how is this propagated to the other backup nodes?

                    "bstansberry@jboss.com" wrote:

                    evictOnFind is not replicated to the cluster. It is a parameter passed in to the clustered get call, so when a node responds to a clustered get request, it will evict the data it sends across.


                    Sorry, I was imprecise in my wording :( Will we have the problem I described, i.e. when data is evicted from the data owner as part of gravitation, the data owner's buddy's don't know this and thus end up with stale data?


                    Good point. This evict would need to be broadcast to the buddy group as well to keep things 'clean'.

                    • 37. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                      brian.stansberry

                       

                      "manik.surtani@jboss.com" wrote:

                      Yeah, putting this stuff in the regular tree will mean that it gets mixed up with regular data and gets backed up as regular data as well. All sorts of issues, plus you don't have the clean, "treat all buddies as equal" thing.


                      Yes, definitely not clean. When you say "gets backed up as regular data as well", are you referring to backup to a buddy? Or to a cache loader? If the former, there should be no replication to a buddy unless there is local activity that touches those nodes, yes???


                      Perhaps a get() with a checkBuddyTree option. Hmm. Which buddy tree do we check? An instance could be a buddy to more than one data owner.


                      Presumably you'd have to check them all. Again, ugly.

                      And then if a request makes changes to the backup, how is this propagated to the other backup nodes?


                      Not sure what you meant here. I envisioned this "get()" only being part of data gravitation and having the evictOnFind option and the checkBuddyTree option (or maybe some differently named option that encapsulates both behaviors). So whichever cache initiated the get would be the new owner of the data.

                      • 38. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                        manik

                         


                        When you say "gets backed up as regular data as well", are you referring to backup to a buddy? Or to a cache loader? If the former, there should be no replication to a buddy unless there is local activity that touches those nodes, yes???


                        True. Hmm, let me think about that - perhaps there is a way to mix primary and buddy data in the same tree, without the need of a '_buddy_backup_' subtree. As that would solve all problems on the spot.



                        • 39. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                          manik

                          Ok, after some thought (and extensive IM discussion with Brian) I think we may be able to do away with the whole _buddy_backup_ subtree where backup data is stored. And this would need a boolean owner flag on Node.class.

                          Here's how it could work:

                          1. Replication occurs within your buddy group whenever data is written (or removed) from a cache instance. The owner flag for this node is set to true.
                          2. Buddies who receive this replication marks the owner flag to false.
                          3. If a cache instance fails, even if the buddy group hasn't realised it, data gravitation will still work since clustered gets will still find the data in the buddy caches.
                          4. On data gravitation, the evict() is replicated to the entire cluster to ensure the data is flushed from buddies.
                          5. The cache taking ownership of this data put()'s the data in its own cache, becomes the owner, and this will replicate to its own buddy group.
                          6. The owner flag is used when initial state transfers are done, so that new caches joining the cluster can join buddy groups and data owners will know which nodes to transfer based on this flag.

                          What do people think of this? While this may result in a bit more tree walking (step 6) it will mean a much more efficient (lazy) process when failover occurs as well as a more robust failover model that at any time when a data owner dies a request can go anywhere in the cluster and continue being served, even if the buddy group hasn't realised it's data owner has died.

                          Cheers,
                          Manik

                          • 40. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                            brian.stansberry

                            A few things to consider:

                            1) A cache becomes owner of data for which it was previously a backup; i.e. no failed get and therefore no call to the ClusteredCacheLoader. In this case, the cache would have to replicate the data to its buddies, but doesn't necessarily need to do a local put(), as it already has the data.

                            Two ways this could be triggered:

                            a) Cache discovers its buddy has failed; therefore it needs to walk through the tree taking ownership of all data owned by the old owner.
                            b) Cache has no idea its buddy has failed, but receives a get() call targetting a node it doesn't own. E.g. load balancer fails over a webapp request before the JGroups suspicion process is complete. Here the cache should just take ownership of the single node.

                            Perhaps the simplest way to take ownership is just to do a put(Fqn, Map) where the map is the existing map. Such a call will trigger replication and mark the node as owned by the cache. Tricky thing is in case b) above the put would be triggered as part of get() processing.

                            2) If a cache stops being a buddy but doesn't take ownership of data, it needs to walk the tree to remove the data. E.g. 6 node cluster where nodes B and C are buddies to A. C is not B's buddy. C is also a buddy to D. A dies, so that buddy group dissolves, with B taking ownership; C needs to remove the data formerly owned by A.

                            This implies that C needs to know who the owner of "backup" nodes was, not just that somebody else was. It doesn't want to remove D's data. So I don't think a boolean owner flag is sufficient; the owner needs to be identified.

                            3) Disjointed ownership of trees. E.g. /a/b/c structure where a cache owns "a" and "c" but not "b". This could happen if the cache created "c" and then later another cache does a put() on "b". What to do in cases of state transfer? Transfer "b" but with an empty map? And what about if we stop being a buddy to whoever owns "b"? Can't prune "b", as we need to keep "c". Just remove b's data map?

                            • 41. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                              brian.stansberry

                              Question on replication groups:

                              Will these be a hard, or a suggestion? E.g. 8 node cluster where caches 1-4 are configured with replication group A while caches 5-8 have replication group B. Same channel. They are configured to require 2 buddies, i.e. 3 cache's in each buddy group. Now caches 1 and 2 are taken down, leaving 3 and 4 short a buddy -- will 3 and 4 go find a buddy from repl group B?

                              In WL, the replication group is a suggestion; i.e. 3 and 4 would go find a buddy from group B.

                              This impacts on a discussion Manik and I were having re: heterogeneous clusters, where certain components were deployed on some nodes and not others. If PojoCache-style replication was needed by those components, we'd need to have classloaders registered with the cache to do unmarshalling.

                              Manik and I discussed that in that kind of situation, you could use replication groups, where all the members of the group were homogeneous. Idea being that the replication group ensures that all buddies have access to the necessary classloaders.

                              This approach wouldn't work reliably if replication groups were suggestions.

                              • 42. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                                brian.stansberry

                                Related issue is management. If replication groups are inviolable, they become a much more significant construct for management tools like JBoss ON. You can start organizing deployments, etc. around them. I suppose you can do that even if they are just suggestions, but it's less meaningful.

                                • 43. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                                  manik

                                   

                                  "bstansberry@jboss.com" wrote:


                                  a) Cache discovers its buddy has failed; therefore it needs to walk through the tree taking ownership of all data owned by the old owner.
                                  b) Cache has no idea its buddy has failed, but receives a get() call targetting a node it doesn't own. E.g. load balancer fails over a webapp request before the JGroups suspicion process is complete. Here the cache should just take ownership of the single node.


                                  Do we even consider a) anymore? Just let the backup data be as backups - ownerless - until a get() call comes into the cluster for this data. Then let the cache that deals with this get() become th eowner of this single node. I.e., let ownership shift on a node-by-node basis. (And let data that never gets requested for again gradually get evicted)

                                  There is little point in a single cache ("primary buddy") taking ownership of all the failed cache's data if the load balancer will be dstributing requests for the failed cache across the cluster anyway.

                                  "bstansberry@jboss.com" wrote:

                                  Perhaps the simplest way to take ownership is just to do a put(Fqn, Map) where the map is the existing map. Such a call will trigger replication and mark the node as owned by the cache. Tricky thing is in case b) above the put would be triggered as part of get() processing.


                                  Or perhaps just changing the get() code - if a get is done on a node where the cache is not owner, set ownership to true and broadcast this change of ownership to the (original and new) buddy group?

                                  "bstansberry@jboss.com" wrote:

                                  2) If a cache stops being a buddy but doesn't take ownership of data, it needs to walk the tree to remove the data. E.g. 6 node cluster where nodes B and C are buddies to A. C is not B's buddy. C is also a buddy to D. A dies, so that buddy group dissolves, with B taking ownership; C needs to remove the data formerly owned by A.

                                  This implies that C needs to know who the owner of "backup" nodes was, not just that somebody else was. It doesn't want to remove D's data. So I don't think a boolean owner flag is sufficient; the owner needs to be identified.


                                  A broadcast. Everytime a change of ownership occurs, broadcast to the cluster so caches that need to remove backup data could do so. And yes, we'd need a further identifier on Node.class - a buddyGroupName?

                                  "bstansberry@jboss.com" wrote:

                                  3) Disjointed ownership of trees. E.g. /a/b/c structure where a cache owns "a" and "c" but not "b". This could happen if the cache created "c" and then later another cache does a put() on "b". What to do in cases of state transfer? Transfer "b" but with an empty map? And what about if we stop being a buddy to whoever owns "b"? Can't prune "b", as we need to keep "c". Just remove b's data map?


                                  Yeah - remove b's data map and mark it as uninitialized? Not too sure about this scenario.

                                  • 44. Re: Buddy Replication in JBoss Cache (JBCACHE-61)
                                    manik

                                     

                                    "bstansberry@jboss.com" wrote:

                                    Will these be a hard, or a suggestion?


                                    Perhaps this could be configurable as well. For heterogenous replication groups to make any sense, this would have to be hard - but to achieve the same levels of backup security, it would need to be flexible to utilise servers from other repl groups. A tradeoff we could pass on to the user I suppose.