6 Replies Latest reply: Mar 26, 2012 6:29 AM by Adam Ringel RSS

Problem with XARecoveryModule.

Sławomir Wojtasiak Newbie

After losing all connections to my database I found a problem with JTA Recovery Manager. Following stacktrace tells us that connection used by recovery manager has already been closed, so finally recovery manager (for PG) is not working since that.

 

 

15:25:00,641 WARN  ARJUNA-16027 Local XARecoveryModule.xaRecovery got XA exception XAException.XAER_RMERR: org.postgresql.xa.PGXAException: Error during recover

        at org.postgresql.xa.PGXAConnection.recover(PGXAConnection.java:358)

        at org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.recover(XAManagedConnection.java:294)

        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecovery(XARecoveryModule.java:468)

        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.resourceInitiatedRecoveryForRecoveryHelpers(XARecoveryModule.java:436)

        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:155)

        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789)

        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371)

Caused by: org.postgresql.util.PSQLException: This connection has been closed.

        at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:714)

        at org.postgresql.jdbc3.AbstractJdbc3Connection.createStatement(AbstractJdbc3Connection.java:230)

        at org.postgresql.jdbc2.AbstractJdbc2Connection.createStatement(AbstractJdbc2Connection.java:191)

        at org.postgresql.xa.PGXAConnection.recover(PGXAConnection.java:331)

        ... 6 more

 

 

 

I have done some investigation and I have found that this connection is cached and used to process recovering until server (or maybe only application) is restarted/redeployed.

 

XARecoveryModule uses xaResourceRecoveryHelper to get XAResource used then to process recovering by calling "recover" method on it. The problem is that returned XAManagedConnection is a cached one and  there is no sanity checking of the connection. ManagedConnectionFactoryDeployment is responsible for caching it and it does nothing to provide valid connection:

 

ManagedConnectionFactoryDeployment

/**
 * Open a managed connection
 * @param s The subject
 * @return The managed connection
 * @exception ResourceException Thrown in case of an error
 */
private ManagedConnection open(Subject s) throws ResourceException
{
  if (recoverMC == null)
  {
     recoverMC = createManagedConnection(s, null);
  }

  return recoverMC;
}

 

 

Such a cached XAManagedConnection is returned to XARecoveryModule.

 

To be honest there is a check in ManagedConnectionFactoryDeployment that was probably supposed to reconnect invalid connections but in this situation it's not possible as long as XAManangedConnection does not provide any sanity check in getXAResource().

 

ManagedConnectionFactoryDeployment

try
{
  xaResource = mc.getXAResource();
}
catch (ResourceException reconnect)
{
  close(mc);
  mc = open(subject);
  xaResource = mc.getXAResource();
}

 

 

XAManagedConnection

public XAResource getXAResource() throws ResourceException
{
  return this;
}

 

 

So XARecoveryModule gets invalid XAResource instance and tries to process recovering by invoking recover(XAResource.TMSTARTRSCAN).

 

XARecoveryModule

try
{
    trans = xares.recover(XAResource.TMSTARTRSCAN);

    if (jtaLogger.logger.isDebugEnabled()) {
        jtaLogger.logger.debug("Found "
                + ((trans != null) ? trans.length : 0)
                + " xids in doubt");
    }
}
catch (XAException e)
{
    jtaLogger.i18NLogger.warn_recovery_xarecovery1(_logName+".xaRecovery", XAHelper.printXAErrorCode(e), e);

    try
    {
        xares.recover(XAResource.TMENDRSCAN);
    }
    catch (Exception e1)
    {
    }

    return false;
}

 

 

This invocation ends inside PGXAConnection which does a sanity check and throws appropriate exception. This connection is caught by "catch" above and scanning for prepared transaction branches is just finished.

 

As you can see invalid XAResource is not invalidated in any way and next periodic recovery will also fail using it.

 

Does anyone resolve that? This looks like a bug, but maybe there is a way to resolve this problem only by using configuration.

 

I'm using JBoss 6.0.0 AS.

 

Thanks in advance,

Slawek

  • 1. Re: Problem with XARecoveryModule.
    Tom Jenkinson Master

    Slawek,

     

    My apologies for not responding to this earlier! It appears that the issue you have reported is related to the JCA code rather than the JTA. Thankyou for making your report is so clear!

     

    I would suggest that you raise this issue on the ironjacamar forums here: https://community.jboss.org/community/ironjacamar?view=discussions

     

    Thank you once again for raising such a clear report, it is greatly appreciated!

     

    Tom

  • 2. Re: Problem with XARecoveryModule.
    Tom Jenkinson Master

    Wow, I think I managed to move it for you, hi IronJacamar guys, Slawek thinks he spotted an issue with managed connection.

  • 3. Re: Problem with XARecoveryModule.
    Jesper Pedersen Master

    AS 6.x isn't maintained anymore - use JBoss EAP 5.1 or AS 7.1 where the recovery logic is different.

  • 4. Re: Problem with XARecoveryModule.
    Sławomir Wojtasiak Newbie

    Thanks for your answers. It's been my plan to move to the current version, but I also wanted to point out potential problem writting this post

  • 5. Re: Problem with XARecoveryModule.
    Adam Ringel Newbie

    Not only does JBoss 7 not appear ready to us to move our production code to, we also don't have the resources to do such a large effort just to fix this one problem.

    We analyzed the code further and noticed that not only is the getXAResource not doing any sanity checking, but the cached instance variable:

    private ManagedConnection recoverMC = null;
    

     

     

    is not being reinitialized, ever.  You can see the open code used in the recovery will never recreate it:

     

       private ManagedConnection open(Subject s) throws ResourceException
       {
          if (recoverMC == null)
          {
             recoverMC = createManagedConnection(s, null);
                }
    
    
          return recoverMC;
       }
    

     

     

    You can't even reinitialize it using the stopService lifecycle method from JMX.  You can see the close method nulls the passed in reference, but the actual instance variable is not nulled!!

     

       protected void stopService()
       {
          if (recoveryRegistered)
          {
             if (getXAResourceRecoveryRegistry() != null)
             {
                close(recoverMC);
    
    
                getXAResourceRecoveryRegistry().removeXAResourceRecovery(this);
                recoveryRegistered = false;
    
    
                if (log.isDebugEnabled())
                   log.debug("Unregistered for XA Resource Recovery: " + dmd.getJndiName());
             }
          }
    
    
          mcf = null;
          mcfClass = null;
       }
    
       private void close(ManagedConnection mc)
       {
          if (mc != null)
          {
             try
             {
                mc.cleanup();
             }
             catch (ResourceException ire)
             {
                if (log.isDebugEnabled())
                   log.debug("Error during recovery cleanup", ire);
             }
          }
    
    
          if (mc != null)
          {
             try
             {
                mc.destroy();
             }
             catch (ResourceException ire)
             {
                if (log.isDebugEnabled())
                   log.debug("Error during recovery destroy", ire);
             }
          }
    
    
          mc = null;
       }
    

     

     

    When we have a DB crash that cached ManagedConnection is no good anymore but it never gets reinitialized.  We would like to put a check in the open method that would verify the cached MC is healthy and if it isn't reinitialize it.

    Any ideas on how to check the health at that stage?

  • 6. Re: Problem with XARecoveryModule.
    Adam Ringel Newbie

    If anyone is still interested, we modified ManagedConnectionFactoryDeployment like so and the issue was resolved:

     

       private synchronized ManagedConnection open(Subject s) throws ResourceException {
                 if (recoverMC == null) {
                           recoverMC = createManagedConnection(s, null);
    
    
                           return recoverMC;
                 }
    
    
                 boolean valid = true;
                 if(recoverMC instanceof BaseWrapperManagedConnection) {
                           valid = ((BaseWrapperManagedConnection) recoverMC).checkValid();
                 }
    
    
                 if(! valid) {
                           log.info("open - recoverMC is not valid, reopening for deployment: " + toString());
                           close(recoverMC);
                           recoverMC = createManagedConnection(s, null);
                 }
                 
                 return recoverMC;
       }
    

     

     

    Also nulled out recoverMC in the stopService method:

     

       protected void stopService()
       {
          if (recoveryRegistered)
          {
             if (getXAResourceRecoveryRegistry() != null)
             {
                close(recoverMC);
    
    
                getXAResourceRecoveryRegistry().removeXAResourceRecovery(this);
                recoveryRegistered = false;
    
    
                if (log.isDebugEnabled())
                   log.debug("Unregistered for XA Resource Recovery: " + dmd.getJndiName());
             }
          }
    
    
                recoverMC = null;
          mcf = null;
          mcfClass = null;
       }