1 2 3 4 Previous Next 80 Replies Latest reply on Jan 27, 2010 4:51 PM by marklittle Go to original post
      • 15. Re: Jboss transaction recovery issue
        jhalliday

        You need to keep the other TS .jars in sync too or you'll hit compatability issues. At miniumum you'll need the -common and -integration upgraded to match.

         

        "Also note that, whilst some JBossTS components are packaged individually, it is not supported to mix and match versions."

        (http://www.redhat.com/docs/en-US/JBoss_Enterprise_Application_Platform/5.0.0/html/Administration_And_Configuration_Guide/ch07s12.html)

        • 16. Re: Jboss transaction recovery issue
          scarceller

          I'm willing to retest the scenario

           

          Here I found all the released TS versions http://www.jboss.org/jbosstm/downloads/

           

          I have 2 environments that this has failed in JBossAS_V5.1 and JBossEAP_V5.0

           

          What version of TS should I upgrade to and try?

          Will the JBossTS 4.7.0.GA be a good choice?

           

          I preffer to get the new TS with the fixed JBTM-602 from a stable release.

           

          What do you recommend?

           

          Thanks.


          • 17. Re: Jboss transaction recovery issue
            scarceller

            I downloaded the following from http://repository.jboss.com/maven2-brew/jboss/jbossts/

             

            V4.6.1.GA_CP03 files I downloaded

            jbossjts.jar

            jbossjts-integration.jar

            jbossjts-jacorb.jar

            jbossjts-common.jar

             

            I then backedup the original ones and put these new V4.6.1.GA_CP03 jar files in place of the 4 original ones.

             

            I deleted tran logs prepared the DBs and re-ran the testcase, same issue the TM can't do the commit on the 2nd DB during recovery. Something else is wrong here. V4.6.1.GA_CP03 jar files have not helped this test case.

             

            ---------------------------------

            Let me give you few more details about another test that does work:

            - d1.prepare

            - d2.perpare

            - Disconnect d2 database right at this point. If you do this then neither d1 nor d2 has commited, the 2 DBs have only voted in phase1

            - stop AS, just to keep things simple and be sure we have no inflight trans during recovery.

            - re-connect network to d2

            - check DBs and they match just fine but I have a in-doubt sitting in d2 that needs to be rolledback. Rollback would be correct for this scenario since neither DB was commited when the failure occured. Furthermore, d1 was rolledback by the current tran since it still can be communicated with.

            - re-start AS

            - TM sees the indoubt in d2 and correctly does a rollback.

             

            The testcase that does not work is the one where d1 commited but d2 went down right before the d2.commit. As already explained, in this case the only correct outcome during recovery would be a commit and this simply never happens. Also the TM Log has this tran identified and the TM keeps trying to resolve the issue over and over but never does. The entry remains in the log and the in-doubt remains in the DB.

            • 18. Re: Jboss transaction recovery issue
              scarceller

              figured I'd take a look at the source code com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.java

               

              Here I looked at method   private final boolean xaRecovery(XAResource xares)

              My log file keeps printing "Will recover this Xid (b)" so I found where this is logged in the code.

              It gets logged in the xaRecovery() method here:

               

              -----------------------------------------------

                                                          if (jtaLogger.loggerI18N
                                                                  .isDebugEnabled())
                                                          {
                                                              jtaLogger.logger
                                                                      .debug(
                                                                              DebugLevel.FUNCTIONS,
                                                                              VisibilityLevel.VIS_PUBLIC,
                                                                              FacilityCode.FAC_CRASH_RECOVERY,
                                                                              "Will recover this Xid (b)");
                                                          }

               

                                                          doRecovery = true;

              -----------------------------------------------

               

              This seems OK, simply it marks this Xid as needing recovery by setting doRecovery = true;

               

              Then a little further down in the code I find this section:

               

              -------------------------------------------------

                                          try
                                          {
                                              if (doRecovery)
                                              {
                                                  if (jtaLogger.loggerI18N.isInfoEnabled())
                                                  {
                                                      jtaLogger.loggerI18N
                                                              .info(
                                                                      "com.arjuna.ats.internal.jta.recovery.info.rollingback",
                                                                      new Object[]
                                                                      { XAHelper
                                                                              .xidToString((Xid) xids[j]) });
                                                  }

               

                                                  if (!transactionLog( (Xid)xids[j]) )
                                                      xares.rollback((Xid) xids[j]);
                                                  else
                                                  {
                                                      /*
                                                       * Ignore it as the transaction system
                                                       * will recovery it eventually.
                                                       */

                                                  }

              ------------------------------------------

               

              So it marks this Xid as needing recovery but in the recovery code section I see only 2 possible outcomes

              1 -  xares.rollback((Xid) xids[j]);

              2 - /*
                   * Ignore it as the transaction system
                    * will recovery it eventually.
                    */

               

              I suspect my test case is taking the do nothing leg of the else. But either case would be wrong, it's wrong to rollback and also wrong to do nothing.

               

              What I'm really confused about is where in the XARecoveryModule is there any case for commiting an in-doubt?

               

              The fact that I have the "Will recover this Xid (b)" in the log file lets me know where I am in the code. I can tell by looking at this location that a commit of the in-doubt will never occur with the code as is.

               

              Thought this might help.

              • 19. Re: Jboss transaction recovery issue
                marklittle
                If you read the Failure Recovery Guide it explains this in a lot of detail, but essentially the "Ignore it as the transaction system will recovery it eventually." relies on the resource initiated recovery: every resource that has gotten past prepare will have its own log entry and that will be recovered by one of the other recovery modules, which will call commit on it. That's in the code too ;-)
                • 20. Re: Jboss transaction recovery issue
                  jhalliday

                  That code should not be considered in isolation - it's only one part of the recovery mechanism. It's not the XARecoveryModule's job to do commit calls:

                   

                    // Ignore it as the transaction system will recovery it eventually.


                  In this instance it is AtomicActionRecoveryModule.doRecoverTransaction which will call replayPhase2 in cases where a commit rather than a rollback is needed.

                  • 21. Re: Jboss transaction recovery issue
                    marklittle
                    BTW, clean out your ObjectStore before running the test and let me know what's in there after your test fails. ls -lR should be sufficient for now.
                    • 22. Re: Jboss transaction recovery issue
                      scarceller

                      AtomicActionRecoveryModule.doRecoverTransaction I'll keep an eye out to see if this is being invoked.

                       

                      This test case is after a server cold re-start, so I know for sure the AS has no active transactions.

                       

                      Thanks

                      • 23. Re: Jboss transaction recovery issue
                        scarceller

                        Mark,

                         

                        Just to be sure, you want me to clean the ObjectStore? This test case is after a AS cold re-start, I can't have any active transactions in this case.

                         

                        I don't mind trying a test if you have something in mind but please be more specific.

                         

                        I'm simply black box testing the JBoss AS and it's TM under high load (50tps and 100% CPU, 25 clients driving load) and many of the test cases are failing because it can't recover in-doubts that MUST be commited in the 2nd Database (They have already been commited in the 1st Database). Every variation of pulling network cables or power off DBs fails this testing for this same reason. Some transactions are in in-doubt and must be commited (this does not work ever). Others are left in-doubt and must be rolledback (this does work).

                         

                        I just want to be sure I'm not doing something wrong.

                         

                        I ussually don't dig this deep into the failures since I black box test. But since I have source code for JBoss I don't mind doing some digging. But most importantly I simply wish to be sure my configuration is correct. I'm 100% certain that the recovery connections are being built and this evident by the TRACEs where in 2nd pass recovery I clearly see that Oracle has 1 in doubt in need of attention but it never gets resolved and the TM just spins in a loop finding this same in-doubt but never commits it.

                         

                        I have the basic defaults for the default server and I simply added the -ds.xml data sources and the lines needed in the jbossts-properties.xml for setting up AppSeverJDBCXARecovery.

                         

                        BTW - I have your book on "Java Transaction Processing" very helpful.

                         

                        Thanks.

                        • 24. Re: Jboss transaction recovery issue
                          scarceller

                          Mark,

                           

                          I'm attaching the section of the log file that simply keeps repeating, I have TRACE turned fully on for arjuna.

                           

                          I do see plenty of lines with AtomicAction in the log saying it sees a transaction but I have no idea what they all mean.

                          • 25. Re: Jboss transaction recovery issue
                            adinn

                            scarceller wrote:

                             

                            Mark,

                             

                            Just to be sure, you want me to clean the ObjectStore? This test case is after a AS cold re-start, I can't have any active transactions in this case.

                             

                            That sort of depends what you mean by "active". There may be log records in the object store for in-doubt transactions which were in the middle of committing when the AS previously exited. The coordinator writes a log record when all participants have prepared and only deletes it when they have all committed -- that's the "D" part in ACID.

                             

                            When teh AS restarts the recovery system checks for these records and uses them to reload transaction state and recreate transactions which it then tires to roll forward (this is th eonly appropriate response since the record is only present if all participants prepared ok).

                             

                            So, a complete cold start requires deleting all records found in the object store. You can do this by deleting the object store directory,

                            • 26. Re: Jboss transaction recovery issue
                              scarceller

                              Andrew,

                               

                              Then I really don't want to delete the ObjectStore because the condition I have is an in-doubt in need of commit on the 2nd database, the 1st database has already been commited prior to the network failure.

                               

                              If I delete the ObjectStore then the TM would not have any record of this tran needing to be commited, correct? and if the ObjectStore is cleaned the TM does a rollback on the in-doubt. I know this works since I've already tried deleting the following 2 directories jboss-as\sever\default\data\tx-object-store\HashedActionStore and jboss-as\sever\default\data\tx-object-store\ShadowNoFileLockStore and if I empty/delete these 2 dirs then the TM simply does a rollback on the in-doubt and this is the wrong action, the only correct action is a commit because the work was already commited to the 1st DB prior to the network failure.

                               

                              When you say clean the ObjectStore does this simply mean delete both those directories? Just want to be clear on what is meant by clear the ObjectStore. If it means delete both dirs then I've already tested this and can confirm it does a rollback on the in-doubt. What I have NEVER seen the TM do is a commit on an in-doubt that MUST be commited.

                               

                              EDIT:

                              So what I mean is that if you cold re-start the AS then the ORGINAL transaction and thread that started the transaction is long gone and no longer in play. The recovery of any in-doubts is clearly in the hands of the TM recovery, agreed?

                               

                              Thanks for the tips.

                              • 27. Re: Jboss transaction recovery issue
                                adinn

                                scarceller wrote:

                                 

                                Then I really don't want to delete the ObjectStore because the condition I have is an in-doubt in need of commit on the 2nd database, the 1st database has already been commited prior to the network failure.

                                 

                                If I delete the ObjectStore then the TM would not have any record of this tran needing to be commited, correct? and if the ObjectStore is cleaned the TM does a rollback on the in-doubt. I know this works since I've already tried deleting the following 2 directories jboss-as\sever\default\data\tx-object-store\HashedActionStore and jboss-as\sever\default\data\tx-object-store\ShadowNoFileLockStore and if I empty/delete these 2 dirs then the TM simply does a rollback on the in-doubt and this is the wrong action, the only correct action is a commit because the work was already commited to the 1st DB prior to the network failure.

                                When you say clean the ObjectStore does this simply mean delete both those directories? Just want to be clear on what is meant by clear the ObjectStore. If it means delete both dirs then I've already tested this and can confirm it does a rollback on the in-doubt. What I have NEVER seen the TM do is a commit on an in-doubt that MUST be commited.

                                 

                                What Mark meant was you should be sure to delete the object store contents before the AS run in which you break the commit. That way you can be sure that any records in the object store belong to your test TX which was in mid-flight. He wanted to rule out the possibility that there were records left from a TX created in a previous run of the AS.

                                 

                                You can just delete <jboss-as>\server\default\data\tx-object-store before trying your test. It will have the same effect as if you delete the two sub-directories.

                                • 28. Re: Jboss transaction recovery issue
                                  scarceller

                                  Yes, I already always delete those directories before every test. Every test starts with no in-doubts in the DBs and nothing in the TM ObjectStore.

                                   

                                  To be very clear on my issue: simply I am in a state where the 1st DB has commited and the network cable to the 2nd DB was disconnected right at this point (between the commit of the 1st DB and the commit of the 2nd DB). This then leaves the tran commited in the 1st DB and in-doubt in the 2nd DB. Later when network is re-connected and the AS TM can finally comunicate with the 2nd DB it can't perform the commit of the in-doubt. The TM simply spins on trying to resolve the in-doubt but never does.

                                   

                                  This is a fairly common test case for testing 2PC XA failures.

                                   

                                  Thanks.

                                  • 29. Re: Jboss transaction recovery issue
                                    scarceller

                                    Mark,

                                     

                                    I have looked through this manual:

                                    http://www.redhat.com/docs/en-US/JBoss_Enterprise_Application_Platform/5.0.0/pdf/Transactions_Failure_Recovery_Guide/JBoss_Transactions_Failure_Recovery_Guide.pdf

                                     

                                    I see this

                                    ------------------------------------------------------------------------

                                    AtomicAction pseudo code
                                    First Pass:

                                    < create a transaction vector for transaction Uids. >
                                    < read in all transactions for a transaction type AtomicAction. >
                                    while < there are transactions in the vector of transactions. >
                                    do
                                         < add the transaction to the vector of transactions. >
                                    end while.


                                    Second Pass:
                                    while < there are transactions in the transaction vector >
                                    do
                                         if < the intention list for the transaction still exists >
                                         then
                                              < create new transaction cached item >
                                              < obtain the status of the transaction >
                                              if < the transaction is not in progress >                                     (This is the case for my test)
                                              then
                                                   < replay phase two of the commit protocol >     (I think this is what is not taking place correctly)
                                              endif.
                                         endif.
                                    end while.

                                    ----------------------------------------------------------------

                                     

                                    I assume that in my case the commit of the in-doubt should be handled by this line

                                    < replay phase two of the commit protocol >

                                    Am I correct?

                                     

                                    In the log (I sent a few posts above) you will see "replayPhase2" messages with "ActionStatus.COMMITED" but the in-doubt simply remains in the 2nd DB.

                                    1 2 3 4 Previous Next