13 Replies Latest reply on Jan 3, 2005 3:12 PM by adrian.brock

    Finding XAResources in Recovery step

    bill.burke

      Over the holidays, I took a look at how one would implement recovery. I think you're right...It is basically easy to implement(at least without handling Heuristic errors), but hard to perform well.

      One thing that gets me though. How the hell can you get all the XAResources so that you can do the recovery?

      Does ResourceAdapter.getXAResources() come in? If so, how do you get the ActivationSpecs?

      Another thought that came to my mind was to query the MBeanServer for all ManagedConnectionFactory's and get a ManagedConnection from the factory. Problem is that you need a Subject.

      Thoughts?

        • 1. Re: Finding XAResources in Recovery step

          ResourceAdapter.getXAResources is for message inflow recovery,
          see JCA1.5 Section 12.5.2
          The activation specs must be persisted for recovery purposes.

          For outbound it needs to do ManagedConnection.getXAResource().
          For the Subject problem see the special case mentioned in JCA1.5 Section 6.5.3.5
          This looks insecure to me (and probably not supported by some XAResources)
          so we may need to add configuration for a recovery user/password where
          this cannot be determined from the JCA config.

          The fundamental problem is that the transaction manager needs to get the list
          of ResourceAdapters/ManagedConnectionFactorys to perform recovery.
          But this cannot be done using a <depends-list> because the ResourceAdapter
          needs a reference to the transaction manager before it can start (chicken and egg).
          1) For the XATerminator - RARs could try to use this during start()
          2) For the tx-connection-factory/tx datasources - not strictly necessary since we
          don't need the connection manager or pool for recovery

          So what is required is that there be a separate recovery MBean that has a
          <depends-list> of the ResourceAdapters.
          This will allow the following startup ordering:
          1) Transaction manager
          2) Resource Adapters
          3) Recovery manager

          Care must be taken such that the recovery manager only tries to
          recover transactions that were from a previous instances.
          Using either a mark in the log of when it was restarted or by
          recording the jvm id in the log record?

          We also want to fix this tight coupling of the TM and RARs to make this simpler.
          We can then delay some processes to make the recovery occur before
          services become available
          1) Transaction manager
          2) Resource Adapters
          3) Recovery manager
          4) Start WorkManager
          5) Bind ConnectionFactorys to JNDI
          6) Activate MDB activations

          Also bear in mind that JBossMQ and other services(?) use a DataSource
          to persist data. i.e. one XAResource uses another one as a delegate.
          Although in this case, the delegate XAResource should never be taking
          part in two phase commit!?

          • 2. Re: Finding XAResources in Recovery step

            On Heuristics, I think the best thing to do for this is what Corba alllows.
            i.e. We report all heuristics to a special log/notification mechanism
            with some policy on whether it should be automatically rolledback or committed.

            Only the administrator can work out what has gone wrong or whether it is
            consistent with how he tried to resolve a problem.

            • 3. Re: Finding XAResources in Recovery step
              bill.burke

               

              "adrian@jboss.org" wrote:
              ResourceAdapter.getXAResources is for message inflow recovery,
              see JCA1.5 Section 12.5.2
              The activation specs must be persisted for recovery purposes.


              We really don't have any component that uses inflow yet do we? I'll do this one on the second iteration of the recovery mechanism.


              For outbound it needs to do ManagedConnection.getXAResource().
              For the Subject problem see the special case mentioned in JCA1.5 Section 6.5.3.5
              This looks insecure to me (and probably not supported by some XAResources)
              so we may need to add configuration for a recovery user/password where
              this cannot be determined from the JCA config.


              Yes, I already read that section...What Subject are you suppoed to pass in? It is the ConnectionRequestInfo that is supposed to be null.


              The fundamental problem is that the transaction manager needs to get the list
              of ResourceAdapters/ManagedConnectionFactorys to perform recovery.
              But this cannot be done using a <depends-list> because the ResourceAdapter
              needs a reference to the transaction manager before it can start (chicken and egg).
              1) For the XATerminator - RARs could try to use this during start()
              2) For the tx-connection-factory/tx datasources - not strictly necessary since we
              don't need the connection manager or pool for recovery

              So what is required is that there be a separate recovery MBean that has a
              <depends-list> of the ResourceAdapters.
              This will allow the following startup ordering:
              1) Transaction manager
              2) Resource Adapters
              3) Recovery manager



              I don't think this needs to be that complicated. If we force/require all ResourceAdapters and ManagedConnectionFactory's to have a specific ObjectName attribute then we can do an MBean query on the MBeanServer to find these MBeans.

              From what you're saying, I think we need to require that each XA resource be required to implement an MBean whose sole purpose is to obtain a reference to an XAResource interface.

              So, here's what I had in mind:

              1. JBoss Boots up. All MBeans are created and started.
              2. JBoss ServerImpl broadcasts an MBean Startup Notification. (This is currently already coded).
              3. RecoverManager MBean receives the "JBoss Started" notification and begins recovery.
              4. RecoveryManager queries MBeanServer for all "Recovery" Mbeans. These "Recovery" Mbeans will provide references to their XAResources.
              5. RecoveryManager performs recovery.
              6. RecoveryManager tells all "Recovery" Mbeans that they are finished with the XAResources.


              Care must be taken such that the recovery manager only tries to
              recover transactions that were from a previous instances.
              Using either a mark in the log of when it was restarted or by
              recording the jvm id in the log record?


              The RecoveryManager MBean can do this at start(). Just rename existing log files, or use a timestamp in the logfile name. Anyways...I think it would be better to have a specific logger than a DB. That way, we have ultimate speed, and we can have/control as many log files as we want.


              We also want to fix this tight coupling of the TM and RARs to make this simpler.


              Based on what I've said above, do we really need to fix the coupling? Problem is...is it ok to do recovery while a live system is running? Seems it would be ok as long as XAResource.recover does not return Xids of existing running transactions.



              Also bear in mind that JBossMQ and other services(?) use a DataSource
              to persist data. i.e. one XAResource uses another one as a delegate.
              Although in this case, the delegate XAResource should never be taking
              part in two phase commit!?


              This will take more thought when recovery is added to JMS. Isn't/doesn't the TM required to identify duplicate XAResources?

              Bill

              • 4. Re: Finding XAResources in Recovery step

                 


                Yes, I already read that section...What Subject are you suppoed to pass in? It is the ConnectionRequestInfo that is supposed to be null.


                Correct, my memory wasn't working correctly.
                Like I said, you can either used the configured user/password or if there
                isn't one (our jca login modules have mechanisms to provide a default).
                have a separate config for the recovery user/password.

                It should be a case of refactoring the getSubject() code in
                BaseConnectionManager2. I never liked the way this worked in any case :-)


                • 5. Re: Finding XAResources in Recovery step

                   


                  I don't think this needs to be that complicated. If we force/require all ResourceAdapters and ManagedConnectionFactory's to have a specific ObjectName attribute then we can do an MBean query on the MBeanServer to find these MBeans.

                  From what you're saying, I think we need to require that each XA resource be required to implement an MBean whose sole purpose is to obtain a reference to an XAResource interface.

                  So, here's what I had in mind:

                  1. JBoss Boots up. All MBeans are created and started.
                  2. JBoss ServerImpl broadcasts an MBean Startup Notification. (This is currently already coded).
                  3. RecoverManager MBean receives the "JBoss Started" notification and begins recovery.
                  4. RecoveryManager queries MBeanServer for all "Recovery" Mbeans. These "Recovery" Mbeans will provide references to their XAResources.
                  5. RecoveryManager performs recovery.
                  6. RecoveryManager tells all "Recovery" Mbeans that they are finished with the XAResources.

                  The RecoveryManager MBean can do this at start(). Just rename existing log files, or use a timestamp in the logfile name. Anyways...I think it would be better to have a specific logger than a DB. That way, we have ultimate speed, and we can have/control as many log files as we want.


                  I like the idea of a RecoverableMBean. That is similar to what OTS provides.

                  That has a couple of problems:

                  1) One of the MBeans might have failed to start or is no longer deployed.
                  The recovery could be incomplete/inaccurate unless there is a predefined list
                  of expected resources.
                  2) Renaming log files is bad. It is not a repeatable process. What happens if it fails again
                  during recovery.
                  3) Timestamps are bad, especially if you want to move the log to a different
                  server with a uncorrelated up clock.

                  The reason for using the DB is as follows:
                  a) It is usually a part of the transaction already
                  b) It is easy to implement
                  c) It fixes the final problem for a local db
                  i.e. when using the last resource gambit, there is no way to know whether
                  the db commit worked or failed if the AS fails during the DB commit invocation.

                  • 6. Re: Finding XAResources in Recovery step

                     


                    Based on what I've said above, do we really need to fix the coupling? Problem is...is it ok to do recovery while a live system is running? Seems it would be ok as long as XAResource.recover does not return Xids of existing running transactions.


                    In principle there should be no problem. The TM knows which XIDs are
                    currently active (they are in the hashmap of active transactions).

                    In practice, you will probably find Oracle and MSSQL has issues :-)

                    • 7. Re: Finding XAResources in Recovery step

                       


                      This will take more thought when recovery is added to JMS. Isn't/doesn't the TM required to identify duplicate XAResources?


                      It has to do the isSameRM() check. It doesn't need to go down to XID level.
                      Except of course for Oracle where isSameRM() doesn't work correctly,
                      although this might only be on the suspend/start(resume)?

                      • 8. Re: Finding XAResources in Recovery step
                        • 9. Re: Finding XAResources in Recovery step
                          bill.burke

                           

                          "adrian@jboss.org" wrote:

                          I like the idea of a RecoverableMBean. That is similar to what OTS provides.

                          That has a couple of problems:

                          1) One of the MBeans might have failed to start or is no longer deployed.
                          The recovery could be incomplete/inaccurate unless there is a predefined list
                          of expected resources.
                          2) Renaming log files is bad. It is not a repeatable process. What happens if it fails again
                          during recovery.
                          3) Timestamps are bad, especially if you want to move the log to a different
                          server with a uncorrelated up clock.



                          Ok, then it would be implemented as follows:

                          1. TM depends on RecoveryManager
                          2. Recovery manager records prexisting log files.
                          3. Recovery Manager creates new log files.
                          4. Everybody starts up.
                          5. Recover Manager receives START notification. Starts recovering on prexisting recorded file list.


                          The reason for using the DB is as follows:
                          a) It is usually a part of the transaction already
                          b) It is easy to implement
                          c) It fixes the final problem for a local db
                          i.e. when using the last resource gambit, there is no way to know whether
                          the db commit worked or failed if the AS fails during the DB commit invocation.


                          Seems this would only work if the logger was the same DataSource as the Gambitted resource.

                          Bill

                          • 10. Re: Finding XAResources in Recovery step
                            bill.burke

                             

                            "bill.burke@jboss.com" wrote:
                            "adrian@jboss.org" wrote:

                            I like the idea of a RecoverableMBean. That is similar to what OTS provides.

                            That has a couple of problems:

                            1) One of the MBeans might have failed to start or is no longer deployed.
                            The recovery could be incomplete/inaccurate unless there is a predefined list



                            Forgot to answer this one...

                            RecoverableMBean should provide a getId. The RecoverManager will store this at the beginning of the logfile and not allow recovery unless all RecoverableMBeans are deployed.

                            Bill


                            • 11. Re: Finding XAResources in Recovery step

                               

                              "bill.burke@jboss.com" wrote:


                              Ok, then it would be implemented as follows:

                              1. TM depends on RecoveryManager
                              2. Recovery manager records prexisting log files.
                              3. Recovery Manager creates new log files.
                              4. Everybody starts up.
                              5. Recover Manager receives START notification. Starts recovering on prexisting recorded file list.


                              You still have an unrepeatable operation. It is not a good idea
                              to dynamically create logs (it might fail - no disk space - at just the wrong time).
                              Logs should be preallocated space and reused - you rewrite the log(s) from the memory
                              at checkpoints.

                              Like I said, the TM knows its currently active transactions. They are in its
                              memory state.


                              The reason for using the DB is as follows:
                              a) It is usually a part of the transaction already
                              b) It is easy to implement
                              c) It fixes the final problem for a local db
                              i.e. when using the last resource gambit, there is no way to know whether
                              the db commit worked or failed if the AS fails during the DB commit invocation.

                              Seems this would only work if the logger was the same DataSource as the Gambitted resource.
                              Bill


                              Correct. You are only allowed one local resource and it must be the one
                              recording the transactions. In fact this is required even if it is not the tm log.
                              You have to have a mechanism to discover whether the transaction committed
                              for that unlikely occurance that the AS fails during the localdb.commit()
                              see the JIRA link.

                              • 12. Re: Finding XAResources in Recovery step
                                bill.burke

                                 

                                "adrian@jboss.org" wrote:
                                "bill.burke@jboss.com" wrote:


                                Ok, then it would be implemented as follows:

                                1. TM depends on RecoveryManager
                                2. Recovery manager records prexisting log files.
                                3. Recovery Manager creates new log files.
                                4. Everybody starts up.
                                5. Recover Manager receives START notification. Starts recovering on prexisting recorded file list.


                                You still have an unrepeatable operation.



                                No you don't. The files are not renamed. If covery fails at any point, then the next server reboot will just retry those log files.


                                It is not a good idea
                                to dynamically create logs (it might fail - no disk space - at just the wrong time).


                                The disk might fail, but this does not create a scenario of inconsistent state. Prepared resources will just get rolled back during recovery.


                                Logs should be preallocated space and reused - you rewrite the log(s) from the memory
                                at checkpoints.


                                Personally, I prefer a simpler design than a rolling, preallocated log file. I think requiring enough disk space is a reasonable requirement.

                                Bill

                                • 13. Re: Finding XAResources in Recovery step

                                   


                                  The disk might fail, but this does not create a scenario of inconsistent state. Prepared resources will just get rolled back during recovery.


                                  resource1.prepare(); // vote ok
                                  resource2.prepare(); // vote ok
                                  log.record(); // ooops disk full

                                  Now you have to rollback when it could be commited.
                                  Worse, you can't do any more work because all transactions fail from now on.