1 2 Previous Next 25 Replies Latest reply on Jun 28, 2011 4:02 AM by ataylor Go to original post
      • 15. Re: Improvements to HA
        timfox

        Andy Taylor wrote:

         


        Sounds good. Am I right in thinking tho that all the information on available nodes ina cluster can only be propogated to the client if the cluster is symmetrical? what if the node that the client is connected to only knows about one other node in the cluster, i.e. a in a chain cluster.

        I think it should be propagated in the same way other cluster information is propagated, i.e. it obeys the max-hops param

        • 16. Re: Improvements to HA
          timfox

          Leos Bitto wrote:

           

          Clebert Suconic wrote:

           

          I think a ping between backup and live would suffice here. (even on shared journal).

           

          If something wrong happens, you just make the server stop answering the ping/pong.

           

          You could have the same sort of bugs with either implementation. (I.e. the live server still responding pings even though it's on some "undesirable" state).

           

          So what do you propose to happen with the live server when loses its backup server (no ping/pong for some time)? And the same with the backup server? How do you prevent the undesirable situation when both servers are active (serving clients)? This could happen if both cluster nodes are working fine, just the network between them fails.

          Right. This is why I introduced a quorum in the shared nothing case.

           

          This protects against split brain in the case of a network partition.

          • 17. Re: Improvements to HA
            timfox

            Clebert Suconic wrote:

             

            I think a ping between backup and live would suffice here. (even on shared journal).

             

             

            No, because of split brain.

             

            An extra level of protection is necessary for shared nothing.

             

            Some more background on quorums is here http://en.wikipedia.org/wiki/Quorum_%28Distributed_Systems%29

            • 18. Re: Improvements to HA
              jmesnil

              following up on yesterday's IRC conversation, I don' understand what you mean, Tim, when you say that "UDP is just a substitute for the *initial static list*".

              Yes, UDP will be used to retrieve the inital connectors that the node will use to connect to other nodes.

              But UDP will continue to receive packets for the lifetime of the node, if a node joins the cluster later on, it will also be handled by UDP.

              Last week, I updated the discovery code to have a correct view of the cluster (to be notified when the nodes are expired). When a node is stopped and restarted, I am expecting the other nodes to be notified that the node is DOWN when its discovery entry is expired and notified it is UP again when the node is restarted and starts to broadcast again

              • 19. Re: Improvements to HA
                timfox

                Jeff, let me try and explain this better.

                 

                In the old HA (before this refactoring), we relied on UDP for clients and node members to know about the cluster topology. This kind of worked if you had UDP available on your network, but if you didn't have UDP on your network your cluster would be completely static, and clients and other nodes would never receive any cluster updates. I.e. it didn't work.

                 

                One of the key things with the new HA, is that we *do not rely on UDP for full operation of the cluster*. This means that whether or not the user has UDP available, the full set of functionality will be available, clients will know about cluster topology changes, as with servers. This is actually very important going ahead with cloud, since often in cloud environments UDP is not available. So we don't want to reliant on UDP. This is really important to understand as it's a fundamental part of the new approach.

                 

                So... if we're not relying on UDP, how are we going to get cluster topology changes to other nodes in the cluster?

                 

                The answer is: using a normal connection. Once a client, or a another node in the cluster has made just one single normal HornetQ connection to *any* node in the cluster. Then, since every cluster connection is effectively connected to every other cluster connection, this means basically every client and every server node in the cluster is connected via normal HornetQ connections in one or more hops. This means any cluster changes that occur can be propagated down these connections (each connection has listeners) until they reach all nodes.

                 

                What does this mean? It means, as long as any node has made at least one initial connection to the cluster *it does not need UDP* in order to maintain up to date with cluster topology changes. In fact, it is critical that *it does not use UDP* for this.

                 

                Ok, so this is all great after the node has made its first connection, but it doesn't answer the question of how a node makes that first connection. Where does a node, or a client get the connection information so it can make that connection?

                 

                This information about connecting to the first node we call the "initial list". This initial list can either be specified directly in the cluster connection as a list of connectors (called static-connectors), or on the client when creating the ServerLocator.

                 

                If you *do* have UDP on your network, that initial list can also be specified by using a UDP discovery group, which avoids you having to specify fixed list of connectors on the cluster connection or ServerLocator. Please note the connectors obtained from the discovery group DO NOT express the cluster topology, they are ONLY used to provide the initial list of connectors. Note that the discovery group only provides a list of connectors, it *does not* provide live-backup pairs - this is because it's just an initial list not the cluster toplogy!

                 

                So there are two ways to provide the initial list of connectors - 1) via a static list in the config / ServerLocator constructor 2) via UDP discovery.

                 

                The client then uses the initial list to make the *first connection* to a node in the cluster. At this point the client does not know the cluster topology. As the first connection is made, the server it connects to will send the client the full cluster toplogy. This is sent down the normal HornetQ connection. After this the intitial list *is ignored*, all cluster toplogy updates are received via the HornetQ connection, there is no reliance on UDP here.

                 

                So to recap, UDP or static list is used to provide an initial list of servers to connect to. That's it. Once connected to one server, the normal HQ connection is used to receive further topology updates and UDP / static list is not used at all.

                 

                Is that making more sense now?

                • 20. Re: Improvements to HA
                  jmesnil

                  Tim Fox wrote:

                   

                  So to recap, UDP or static list is used to provide an initial list of servers to connect to. That's it. Once connected to one server, the normal HQ connection is used to receive further topology updates and UDP / static list is not used at all.

                   

                  Is that making more sense now?

                  That's the part I don't get.

                   

                  If we use UDP, the discovery group will receive node broadcast for the lifetime of the cluster connection, not only at initialization (like the static connector case).

                  This means that we will be notified of new nodes either through the cluster connection (as you described) or still through UDP.

                   

                  Tim Fox wrote:

                   

                  Once connected to one server, the normal HQ connection is used to receive further topology updates and UDP / static list is not used at all.

                   

                  Do you mean that as soon as the server locator is notified by the discovery group that its list of connector has changed, the server locator must stop its discovery group (otherwise it'll continue to receive UDP packets when other nodes are started)?

                  • 21. Re: Improvements to HA
                    timfox

                    Jeff Mesnil wrote:

                     

                    Tim Fox wrote:

                     

                    So to recap, UDP or static list is used to provide an initial list of servers to connect to. That's it. Once connected to one server, the normal HQ connection is used to receive further topology updates and UDP / static list is not used at all.

                     

                    Is that making more sense now?

                    That's the part I don't get.

                     

                    If we use UDP, the discovery group will receive node broadcast for the lifetime of the cluster connection, not only at initialization (like the static connector case).

                    This means that we will be notified of new nodes either through the cluster connection (as you described) or still through UDP.

                     

                    Of course, the discovery group will continue to receive broadcasts even after you have made your first connection. It's just that that the discovery group is *not used* after that point, they are ignored as long as we have a connection. We do not rely on it's further state changes to inform us about the topology. The topology information comes from the HQ connection *not* the discovery group.

                     

                    Jeff Mesnil wrote:

                     


                    Do you mean that as soon as the server locator is notified by the discovery group that its list of connector has changed, the server locator must stop its discovery group (otherwise it'll continue to receive UDP packets when other nodes are started)?

                    No, I did not say it must be stopped. I just said that information is *not used* after the initial connection is obtained. Clearly if the client closes all its connections, then creates another one it will have to use the initial list again, whether from UDP or static list and it makes sense for that list to be up to date.

                    • 22. Re: Improvements to HA
                      timfox

                      If you look at the code, the discovery group listener is simply used to keep the *initial list* up to date, not the cluster topology.

                       

                      The initial list and cluster topology are maintained as separate structures in ServerLocatorImpl IIRC

                      • 23. Re: Improvements to HA
                        bsebi

                        Hello Tim,

                         

                        Although a frequent reader, I'm a new member in this community so first of all I want to congratulate the entire team and tell you how much I appreciate the efforts in making HornetQ such a successful product.

                         

                        I am interested to know when do you think the improvements to HA that you specify in this discussion thread will become available to the general public? I see a JIRA related to this problem as being scheduled for 2.2.0.CR1 (HORNETQ-402), however I have a hard time finding any dates assiciated with this release.

                         

                        Thank you,

                        Sebastian

                        • 24. Re: Improvements to HA
                          clebert.suconic

                          We are working on it...   We are doing a lot of tests.. (I mean.. a lot) and we got delayed because of that..

                           

                           

                           

                          we are almost done.. it shouldn't be more than 1 month now.

                          • 25. Re: Improvements to HA
                            ataylor

                            As we are now working on re enabling replication ive been looking at how the initial connection is made from a backup server to a live server. This will be done via a configuration on the backup server "live-connector-ref" or something similar which allows the backups to initially connect to the live server. This is fine up to a point however it makes an assumption that the original live server is always available. If a backup server is brought up after a failure it has no way of finding what the new live server is. Also if it came up before then it would have to be informed of the current backup so it knows who to replicate from on failover, meaning we have to hold the current state of all a live servers possible backups and there current state.

                             

                            What we should be able to do is to allow a backup server to connect to any node in the cluster and find out which live server it belongs to, more on this later, however in the short term i propose the fofllowing, instead of having a single live connector ref we allow a list, something like:

                             

                            <live-server-connectors>

                               <ref name="netty"/>

                               <ref name="nettybackup2"/>

                            </live-server-connectors>

                             

                            The list is basically the lie server and any backups servers in the pool. Since only the live server will be accepting connections the Server Locator will always connect to the current live server.

                             

                            In the future it would be good to just use normal cluster connection config to connect to any node in the cluster, however this means that we would have to be able to identify what backup servers belong to which live servers, heres a couple of possibilties of how we could do this.

                             

                            1 Replication groups

                             

                            give each live server and pool of backup servers a replication group id so it can locate its live server, I'm not sure of this since its basically the same as using the node id and making it configurable. Doing it this way will also help us if we wantto start having multiple replicators onb alive server (we already have a use case for this in case any one is wondering)

                             

                            2 Allow any backup server to replicate any live server.

                             

                            Basically whena backup server connects to the clustwer it receives a list of available live servers, it tries each one of these until it finds one that currently isnt being replicated,if it can't find one it sits passive until it receives news of a topolgy change and then tries again.

                            1 2 Previous Next