13 Replies Latest reply: Jan 3, 2005 3:12 PM by Adrian Brock RSS

Finding XAResources in Recovery step

Bill Burke Master

Over the holidays, I took a look at how one would implement recovery. I think you're right...It is basically easy to implement(at least without handling Heuristic errors), but hard to perform well.

One thing that gets me though. How the hell can you get all the XAResources so that you can do the recovery?

Does ResourceAdapter.getXAResources() come in? If so, how do you get the ActivationSpecs?

Another thought that came to my mind was to query the MBeanServer for all ManagedConnectionFactory's and get a ManagedConnection from the factory. Problem is that you need a Subject.

Thoughts?

  • 1. Re: Finding XAResources in Recovery step
    Adrian Brock Master

    ResourceAdapter.getXAResources is for message inflow recovery,
    see JCA1.5 Section 12.5.2
    The activation specs must be persisted for recovery purposes.

    For outbound it needs to do ManagedConnection.getXAResource().
    For the Subject problem see the special case mentioned in JCA1.5 Section 6.5.3.5
    This looks insecure to me (and probably not supported by some XAResources)
    so we may need to add configuration for a recovery user/password where
    this cannot be determined from the JCA config.

    The fundamental problem is that the transaction manager needs to get the list
    of ResourceAdapters/ManagedConnectionFactorys to perform recovery.
    But this cannot be done using a <depends-list> because the ResourceAdapter
    needs a reference to the transaction manager before it can start (chicken and egg).
    1) For the XATerminator - RARs could try to use this during start()
    2) For the tx-connection-factory/tx datasources - not strictly necessary since we
    don't need the connection manager or pool for recovery

    So what is required is that there be a separate recovery MBean that has a
    <depends-list> of the ResourceAdapters.
    This will allow the following startup ordering:
    1) Transaction manager
    2) Resource Adapters
    3) Recovery manager

    Care must be taken such that the recovery manager only tries to
    recover transactions that were from a previous instances.
    Using either a mark in the log of when it was restarted or by
    recording the jvm id in the log record?

    We also want to fix this tight coupling of the TM and RARs to make this simpler.
    We can then delay some processes to make the recovery occur before
    services become available
    1) Transaction manager
    2) Resource Adapters
    3) Recovery manager
    4) Start WorkManager
    5) Bind ConnectionFactorys to JNDI
    6) Activate MDB activations

    Also bear in mind that JBossMQ and other services(?) use a DataSource
    to persist data. i.e. one XAResource uses another one as a delegate.
    Although in this case, the delegate XAResource should never be taking
    part in two phase commit!?

  • 2. Re: Finding XAResources in Recovery step
    Adrian Brock Master

    On Heuristics, I think the best thing to do for this is what Corba alllows.
    i.e. We report all heuristics to a special log/notification mechanism
    with some policy on whether it should be automatically rolledback or committed.

    Only the administrator can work out what has gone wrong or whether it is
    consistent with how he tried to resolve a problem.

  • 3. Re: Finding XAResources in Recovery step
    Bill Burke Master

     

    "adrian@jboss.org" wrote:
    ResourceAdapter.getXAResources is for message inflow recovery,
    see JCA1.5 Section 12.5.2
    The activation specs must be persisted for recovery purposes.


    We really don't have any component that uses inflow yet do we? I'll do this one on the second iteration of the recovery mechanism.


    For outbound it needs to do ManagedConnection.getXAResource().
    For the Subject problem see the special case mentioned in JCA1.5 Section 6.5.3.5
    This looks insecure to me (and probably not supported by some XAResources)
    so we may need to add configuration for a recovery user/password where
    this cannot be determined from the JCA config.


    Yes, I already read that section...What Subject are you suppoed to pass in? It is the ConnectionRequestInfo that is supposed to be null.


    The fundamental problem is that the transaction manager needs to get the list
    of ResourceAdapters/ManagedConnectionFactorys to perform recovery.
    But this cannot be done using a <depends-list> because the ResourceAdapter
    needs a reference to the transaction manager before it can start (chicken and egg).
    1) For the XATerminator - RARs could try to use this during start()
    2) For the tx-connection-factory/tx datasources - not strictly necessary since we
    don't need the connection manager or pool for recovery

    So what is required is that there be a separate recovery MBean that has a
    <depends-list> of the ResourceAdapters.
    This will allow the following startup ordering:
    1) Transaction manager
    2) Resource Adapters
    3) Recovery manager



    I don't think this needs to be that complicated. If we force/require all ResourceAdapters and ManagedConnectionFactory's to have a specific ObjectName attribute then we can do an MBean query on the MBeanServer to find these MBeans.

    From what you're saying, I think we need to require that each XA resource be required to implement an MBean whose sole purpose is to obtain a reference to an XAResource interface.

    So, here's what I had in mind:

    1. JBoss Boots up. All MBeans are created and started.
    2. JBoss ServerImpl broadcasts an MBean Startup Notification. (This is currently already coded).
    3. RecoverManager MBean receives the "JBoss Started" notification and begins recovery.
    4. RecoveryManager queries MBeanServer for all "Recovery" Mbeans. These "Recovery" Mbeans will provide references to their XAResources.
    5. RecoveryManager performs recovery.
    6. RecoveryManager tells all "Recovery" Mbeans that they are finished with the XAResources.


    Care must be taken such that the recovery manager only tries to
    recover transactions that were from a previous instances.
    Using either a mark in the log of when it was restarted or by
    recording the jvm id in the log record?


    The RecoveryManager MBean can do this at start(). Just rename existing log files, or use a timestamp in the logfile name. Anyways...I think it would be better to have a specific logger than a DB. That way, we have ultimate speed, and we can have/control as many log files as we want.


    We also want to fix this tight coupling of the TM and RARs to make this simpler.


    Based on what I've said above, do we really need to fix the coupling? Problem is...is it ok to do recovery while a live system is running? Seems it would be ok as long as XAResource.recover does not return Xids of existing running transactions.



    Also bear in mind that JBossMQ and other services(?) use a DataSource
    to persist data. i.e. one XAResource uses another one as a delegate.
    Although in this case, the delegate XAResource should never be taking
    part in two phase commit!?


    This will take more thought when recovery is added to JMS. Isn't/doesn't the TM required to identify duplicate XAResources?

    Bill

  • 4. Re: Finding XAResources in Recovery step
    Adrian Brock Master

     


    Yes, I already read that section...What Subject are you suppoed to pass in? It is the ConnectionRequestInfo that is supposed to be null.


    Correct, my memory wasn't working correctly.
    Like I said, you can either used the configured user/password or if there
    isn't one (our jca login modules have mechanisms to provide a default).
    have a separate config for the recovery user/password.

    It should be a case of refactoring the getSubject() code in
    BaseConnectionManager2. I never liked the way this worked in any case :-)


  • 5. Re: Finding XAResources in Recovery step
    Adrian Brock Master

     


    I don't think this needs to be that complicated. If we force/require all ResourceAdapters and ManagedConnectionFactory's to have a specific ObjectName attribute then we can do an MBean query on the MBeanServer to find these MBeans.

    From what you're saying, I think we need to require that each XA resource be required to implement an MBean whose sole purpose is to obtain a reference to an XAResource interface.

    So, here's what I had in mind:

    1. JBoss Boots up. All MBeans are created and started.
    2. JBoss ServerImpl broadcasts an MBean Startup Notification. (This is currently already coded).
    3. RecoverManager MBean receives the "JBoss Started" notification and begins recovery.
    4. RecoveryManager queries MBeanServer for all "Recovery" Mbeans. These "Recovery" Mbeans will provide references to their XAResources.
    5. RecoveryManager performs recovery.
    6. RecoveryManager tells all "Recovery" Mbeans that they are finished with the XAResources.

    The RecoveryManager MBean can do this at start(). Just rename existing log files, or use a timestamp in the logfile name. Anyways...I think it would be better to have a specific logger than a DB. That way, we have ultimate speed, and we can have/control as many log files as we want.


    I like the idea of a RecoverableMBean. That is similar to what OTS provides.

    That has a couple of problems:

    1) One of the MBeans might have failed to start or is no longer deployed.
    The recovery could be incomplete/inaccurate unless there is a predefined list
    of expected resources.
    2) Renaming log files is bad. It is not a repeatable process. What happens if it fails again
    during recovery.
    3) Timestamps are bad, especially if you want to move the log to a different
    server with a uncorrelated up clock.

    The reason for using the DB is as follows:
    a) It is usually a part of the transaction already
    b) It is easy to implement
    c) It fixes the final problem for a local db
    i.e. when using the last resource gambit, there is no way to know whether
    the db commit worked or failed if the AS fails during the DB commit invocation.

  • 6. Re: Finding XAResources in Recovery step
    Adrian Brock Master

     


    Based on what I've said above, do we really need to fix the coupling? Problem is...is it ok to do recovery while a live system is running? Seems it would be ok as long as XAResource.recover does not return Xids of existing running transactions.


    In principle there should be no problem. The TM knows which XIDs are
    currently active (they are in the hashmap of active transactions).

    In practice, you will probably find Oracle and MSSQL has issues :-)

  • 7. Re: Finding XAResources in Recovery step
    Adrian Brock Master

     


    This will take more thought when recovery is added to JMS. Isn't/doesn't the TM required to identify duplicate XAResources?


    It has to do the isSameRM() check. It doesn't need to go down to XID level.
    Except of course for Oracle where isSameRM() doesn't work correctly,
    although this might only be on the suspend/start(resume)?

  • 9. Re: Finding XAResources in Recovery step
    Bill Burke Master

     

    "adrian@jboss.org" wrote:

    I like the idea of a RecoverableMBean. That is similar to what OTS provides.

    That has a couple of problems:

    1) One of the MBeans might have failed to start or is no longer deployed.
    The recovery could be incomplete/inaccurate unless there is a predefined list
    of expected resources.
    2) Renaming log files is bad. It is not a repeatable process. What happens if it fails again
    during recovery.
    3) Timestamps are bad, especially if you want to move the log to a different
    server with a uncorrelated up clock.



    Ok, then it would be implemented as follows:

    1. TM depends on RecoveryManager
    2. Recovery manager records prexisting log files.
    3. Recovery Manager creates new log files.
    4. Everybody starts up.
    5. Recover Manager receives START notification. Starts recovering on prexisting recorded file list.


    The reason for using the DB is as follows:
    a) It is usually a part of the transaction already
    b) It is easy to implement
    c) It fixes the final problem for a local db
    i.e. when using the last resource gambit, there is no way to know whether
    the db commit worked or failed if the AS fails during the DB commit invocation.


    Seems this would only work if the logger was the same DataSource as the Gambitted resource.

    Bill

  • 10. Re: Finding XAResources in Recovery step
    Bill Burke Master

     

    "bill.burke@jboss.com" wrote:
    "adrian@jboss.org" wrote:

    I like the idea of a RecoverableMBean. That is similar to what OTS provides.

    That has a couple of problems:

    1) One of the MBeans might have failed to start or is no longer deployed.
    The recovery could be incomplete/inaccurate unless there is a predefined list



    Forgot to answer this one...

    RecoverableMBean should provide a getId. The RecoverManager will store this at the beginning of the logfile and not allow recovery unless all RecoverableMBeans are deployed.

    Bill


  • 11. Re: Finding XAResources in Recovery step
    Adrian Brock Master

     

    "bill.burke@jboss.com" wrote:


    Ok, then it would be implemented as follows:

    1. TM depends on RecoveryManager
    2. Recovery manager records prexisting log files.
    3. Recovery Manager creates new log files.
    4. Everybody starts up.
    5. Recover Manager receives START notification. Starts recovering on prexisting recorded file list.


    You still have an unrepeatable operation. It is not a good idea
    to dynamically create logs (it might fail - no disk space - at just the wrong time).
    Logs should be preallocated space and reused - you rewrite the log(s) from the memory
    at checkpoints.

    Like I said, the TM knows its currently active transactions. They are in its
    memory state.


    The reason for using the DB is as follows:
    a) It is usually a part of the transaction already
    b) It is easy to implement
    c) It fixes the final problem for a local db
    i.e. when using the last resource gambit, there is no way to know whether
    the db commit worked or failed if the AS fails during the DB commit invocation.

    Seems this would only work if the logger was the same DataSource as the Gambitted resource.
    Bill


    Correct. You are only allowed one local resource and it must be the one
    recording the transactions. In fact this is required even if it is not the tm log.
    You have to have a mechanism to discover whether the transaction committed
    for that unlikely occurance that the AS fails during the localdb.commit()
    see the JIRA link.

  • 12. Re: Finding XAResources in Recovery step
    Bill Burke Master

     

    "adrian@jboss.org" wrote:
    "bill.burke@jboss.com" wrote:


    Ok, then it would be implemented as follows:

    1. TM depends on RecoveryManager
    2. Recovery manager records prexisting log files.
    3. Recovery Manager creates new log files.
    4. Everybody starts up.
    5. Recover Manager receives START notification. Starts recovering on prexisting recorded file list.


    You still have an unrepeatable operation.



    No you don't. The files are not renamed. If covery fails at any point, then the next server reboot will just retry those log files.


    It is not a good idea
    to dynamically create logs (it might fail - no disk space - at just the wrong time).


    The disk might fail, but this does not create a scenario of inconsistent state. Prepared resources will just get rolled back during recovery.


    Logs should be preallocated space and reused - you rewrite the log(s) from the memory
    at checkpoints.


    Personally, I prefer a simpler design than a rolling, preallocated log file. I think requiring enough disk space is a reasonable requirement.

    Bill

  • 13. Re: Finding XAResources in Recovery step
    Adrian Brock Master

     


    The disk might fail, but this does not create a scenario of inconsistent state. Prepared resources will just get rolled back during recovery.


    resource1.prepare(); // vote ok
    resource2.prepare(); // vote ok
    log.record(); // ooops disk full

    Now you have to rollback when it could be commited.
    Worse, you can't do any more work because all transactions fail from now on.