5 Replies Latest reply on Jul 1, 2011 5:47 AM by ronsen

    Failover performance

    ronsen

      Since the ability (ejb) to failover relies on the client side(if im not mistaken), the failover on a node crashing shouldnt be measurable or?

      The client downloads the dp-object and handles it itsself, so will there be a measurable "delay" if a servercrash occurs and it fails over to another node?

       

       

      My test was to deploy an ejb on 3 nodes and send counting numbers to the cluster. Whenever a node receives a number, it writes the receivment to the database whereas the database adds a timestamp, so that the times wont change according to the server clocks.

      So, whenever a failover occurs, there should be a measureable delay between No. X and Y, just in case it causes some.

       

      The result was, that theres (with nanosecs) actually none...

      Did I understand anything wrong or does a measureable delay only occurs if theres anything has to be restored from replicated data?

       

      Thanks in advance,

        • 1. Re: Failover performance
          wdfink

          The EJB loadbalancing and failover works with a client proxy and a server communication.

           

          The server communicate a shutdown or detect a full stop (e.g. complete hanging or looong full GC).

          A crash, e.g. JVM cored, is also detected.

          It depends on the situation how long it takes.

           

          Fact is

          - all Tx on the chrased server are not commited

          - the client proxy might hung for a few millis and try the next server of its list

          - the next server provide the new cluster view without the crashed server.

           

          So there is no measurable time for such failover in best case and only a few millis in worst.

          • 2. Re: Failover performance
            ronsen

            Thats means teh detection is done on serverside and the client will be informed?

            Because there must be at least a test for a connection to the crashed server, nothing comes back -> next server in list (DP). So there should be a measureable time for it?

             

            But yet, good to know, thanks for clarifying. But how about failover in case of session-replication would that be a measureable value?

            • 3. Re: Failover performance
              wdfink

              For the internal communication you should have a look to:

              http://community.jboss.org/wiki/Shunning

              http://community.jboss.org/wiki/FDVersusFDSOCK

              http://community.jboss.org/wiki/JGroupsPbcastGMS

              http://community.jboss.org/wiki/JGroupsFD

              You will find a lot of information about it works inside.

               

              With HTTP session-replication I do not work this time.

              I know that the most common way is a buddy-replication, only two nodes keep the state of the session.

              If the one where the session is connected fail an other server will process the next call. If this is not the 'buddy' the session must be copied to the current instance and this will take it's time depend to the size of the session data.

              • 4. Re: Failover performance
                ronsen

                Great, thanks. I'm going to take a look and this and will try to get on this with the replication

                • 5. Re: Failover performance
                  ronsen

                  Hey, can somebody probably (please only if you are sure ) why with the load-balancing policy randomRobin/RoundRobin, only the first nodes-1 requests are slow and afterwards everything becomes way faster? Is there something cached and will there be a timeout? when do these values will be invalidated?

                   

                  As an example, send counting numbers to a cluster with a round-robin policy with a 50ms pause in between and measure the amount of time it takes to print the first clustersize-1 values. I discovered that it increased by a factor of ~4

                   

                  thanks a lot,