2 Replies Latest reply on Dec 3, 2008 2:01 PM by mazz

    failed tx never expires

    mazz

      I think there is a problem with tx recovery expiration.

      I have a JBossAS 4.2.1 server that has been running for about a week. Early on I had a database failure but the database is fine now and has been for days. This server does NOT have XA recovery fully configured (I know, I know - but don't get me started on that topic :) - so it got the infamous "Could not find new XAResource to use for recovering non-serializable XAResource" error. But! It's getting this error for days - it is never expiring.

      I looked in the logs, and I get this error message about every 2 minutes for days and days - below I copied the first log message and the last one I got (which is up to the time I started writing this forum post) - look at the timestamps of the logs and notice the tx UID is the same:

      The first one:

      2008-11-25 13:39:26,686 WARN [com.arjuna.ats.jta.logging.loggerI18N] [com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa] [com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa]
      Could not find new XAResource to use for recovering non-serializable
      XAResource < 131075, 29, 27, 1-a1058dc:d7e4:492c2eb9:18a68a1058dc:d7e4:492c2eb9:18cce^@...>


      The latest one (and it's still repeating as I type):
      2008-12-01 23:36:17,100 WARN [com.arjuna.ats.jta.logging.loggerI18N] [com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa]
      [com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa]
      Could not find new XAResource to use for recovering non-serializable
      XAResource < 131075, 29, 27, 1-a1058dc:d7e4:492c2eb9:18a68a1058dc:d7e4:492c2eb9:18cce^@...>


      As I said, this message appears tons of times, this is just the first and last time (and its still going). For 6 days straight and counting.

      Now, I thought JBossTM would expire tx's that cannot recover after 12 hours - controlled by this configuration (this is taken directly out of my jbossjta-properties.xml from this server):

      <!--
       Interval, in hours, between running the expiry scanners.
       This can be quite long. The absolute value determines the interval -
       if the value is negative, the scan will NOT be run until after one
       interval has elapsed. If positive the first scan will be immediately
       after startup. Zero will prevent any scanning.
       Default = 12 = run immediately, then every 12 hours.
       -->
       <property
       name="com.arjuna.ats.arjuna.recovery.expiryScanInterval" value="12"/>
       <!--
       Age, in hours, for removal of transaction status manager item.
       This should be longer than any ts-using process will remain running.
       Zero = Never removed. Default is 12.
       -->
       <property
       name="com.arjuna.ats.arjuna.recovery.transactionStatusManagerExpiryTime" value="12"/>


      My question is - why doesn't this expire? As it stands, I don't think this server will ever come out of its funk - even a restart won't help because I assume the tx-object-store has this tx persisted and will just start back up trying (and failing) to recover. I would have to kill the server and manually delete the tx-object-store directory.

      Of course, I'm expecting the answer to be, "well, configure XA recovery properly and you won't get this error". Ignore that for now :) This still shouldn't cause my server to forever be in a funk - the system should realize after a while its never going to recover this tx and expire those tx's after SOME amount of time (where that time should be something less than 6 days :)


        • 1. Re: failed tx never expires
          mazz

          This fails even if I configure XA recovery properly. I have a Postgres DB, and I've setup XA recovery but for some reason the recovery fails (I don't know why, but the point is, for SOME reason recovery cannot complete due to a postgres exception). The database is up and working fine right now.

          In the logs I see these two - notice the time stamps - the first log is the first time recovery was attempted, the second log is the latest attempt. Notice they are over 2 hours apart:

          12:23:39,046 WARN [com.arjuna.ats.jta.logging.loggerI18N] [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery got XA exception org.postgresql.xa.PGXAException: Error during recover, XAException.XAER_RMERR
          
          14:31:29,109 WARN [loggerI18N] [com.arjuna.ats.internal.jta.recovery.xarecovery
          1] Local XARecoveryModule.xaRecovery got XA exception org.postgresql.xa.PGXAExc
          eption: Error during recover, XAException.XAER_RMERR


          Now, I have configured my expiration settings to 1 hour - so this should have stopped trying to recover at some point but never did:

          <property name="com.arjuna.ats.arjuna.recovery.expiryScanInterval" value="1"/>
          <property name="com.arjuna.ats.arjuna.recovery.transactionStatusManagerExpiryTime" value="1"/>



          • 2. Re: failed tx never expires
            mazz

            I found two reasons for this problem (thanks to Jonathan of the JBossTM team for helping me figure this out).

            First, I just confirmed that if I modify my AppServerJDBCXARecovery recovery object to check the validity of the connection first (and get a new one if not valid), we stop repeating this error (the problem was not in recovery per se, it was a problem because the recovery object failed to provide a valid connection to even try to recover).

            See: https://jira.jboss.org/jira/browse/JBTM-441

            Second, even if the recovery object works, if the recovery never succeeds for whatever reason, it won't expire due to this: https://jira.jboss.org/jira/browse/JBTM-418 which is fixed in JBossTM trunk, but not fixed in my JBossAS 4.2.1.