1 2 3 Previous Next 61 Replies Latest reply on Dec 20, 2011 4:24 PM by tomjenkinson

    Remoting Transport Transaction Inflow Design Discussion

    dmlloyd

      This is a starting point for discussion for the support of transaction control in conjunction with Remoting-based invocation.  All the items in this document are subject to discussion and revision!  There will be a design conference call among the concerned parties in which the details of these points will be worked out; afterwards, a revised version of this document will be made available for any further discussion.

       

      Problem:

       

      It is required that in order to maintain the appropriate level of compatibility with previous releases, the Remoting EJB transport must support a means by which the transactional state of the server can be controlled.  Historically, this has been accomplished using a remote view of UserTransaction.  This is simple to implement in a "thin" client, and is described in the first solution section.  However, implementing this in the context of a server communicating with another server is difficult.  Thus another approach is proposed below for this scenario, which has been described as "JTA inflow" or "JCA-style inflow".

       

      Note that these two solutions are in addition to two other existing options: JTS and simply not supporting transactions.

       

      0. Definitions

       

      0.1. Client invocation context - the context associated with a thread on the client side of an invocation, which in turn relates to a specific Remoting connection providing its transport.

       

      1. Solution description (Remoting JTA client):

       

      This solution applies to standalone, "thin" clients without a local transaction manager.

       

      1.1. Client API and invocation

       

      1.1.1. The client will control transactions on the server using a variation of the UserTransaction interface, which may be accessed via JNDI or other mechanisms in place of a TM-provided UserTransaction implementation.

      1.1.2. Any initiated transaction will be associated with the client thread and client invocation context, and assigned a simple ID which is associated with remote invocations made by that thread during its duration.

      1.1.3. The implementation of this interface will use a simple Remoting-based protocol to forward the method invocations to the server along with the ID of the transaction in question.  (A connection may be shared across multiple client threads, thus it is possible for one connection to control many transactions.)

      1.1.4. Upon invocation, the client will attach the ID associated with the current thread's transaction to the invocation as a "transaction ID attachment" as it is executed.

       

      1.2. Server side

       

      1.2.1. Incoming transaction control commands cause transactions to be begun, committed, and rolled back.

      1.2.2. Begun transactions are suspended and the resultant Transaction object is associated with the connection, along with the ID of the transaction.

      1.2.3. Incoming invocations with a transaction ID attachment are associated with the suspended Transaction, and the Transaction is resumed for the duration of the method invocation.

      1.2.4. If the client session is lost or terminated, any outstanding Transactions are rolled back.

       

      1.3. Unaddressed issues

       

      1.3.1. Transaction timeout control

       

      2. Solution description (Remoting JTA inflow):

       

      This solution applies to servers with a local transaction manager.

       

      2.1. Invocation, client side

       

      2.1.1. Upon invocation, the client will locate (via the local TransactionManager) the current active Transaction and enlist into it an XAResource instance corresponding to the client invocation context.

      2.1.2. The Xid and any pertinent ancillary information regarding the current transaction which were provided to the XAResource will be attached to the invocation as a "transaction inflow attachment" as it is executed.

      2.1.2.1. Xid propagation should rely on a byte representation, not a serialized object, in order to support multiple marshalling schemes.

      2.1.3. Transaction control methods invoked upon the XAResource will be forwarded as commands over the Remoting channel to the server.

       

      2.2. Invocation, server side

       

      2.2.1. Incoming invocations which have a transaction inflow attachment will trigger the use of an interceptor which utilizes the XATerminator (or equivalent) interface to import the transaction for the duration of the invocation execution.

      2.2.2. Incoming XAResource commands will be translated into invocations on the local XATerminator (or equivalent) to carry out transaction completion.

      2.2.3. If the client session is lost or terminated, any open transactions are rolled back; any prepared but incomplete transactions have to be recovered manually.

       

      2.3. Recovery

       

      2.3.1. The client is expected to use "JTA top-down recovery", which will treat the participating server(s) as additional local resources to recover.  The client is always treated as the originator of the transaction.

      2.3.2. The client invocation context must expose a means to acquire its XAResource in order to facilitate local recovery.

       

      2.4. Unaddressed Issues

       

      2.4.1. Transaction timeout control

       

        • 1. Re: Remoting Transport Transaction Inflow Design Discussion
          marklittle

          Some initial thoughts/questions:

           

          a) what do you mean by "unaddressed issues - transaction timeout control"?

           

          b) since this is distributed transactions (context propagation, commit protocol termination etc.) have you considered the scope of the testing that needs to be done in order to ensure this works in *all* cases?

           

          c) "any prepared but incomplete transactions have to be recovered manually"?

           

          d) in case 2, where the client is within the same container as a tm instance, which tm is it trying to control/invoke? The remote one (in the other container instance) or the local one? Because we seem to have gone from controlling transactions (begin, commit, rollback) in the first scenario, to really doing distributed transactions in the second scenario. As you can imagine, these are very different situations.

           

          e) what about interoposition, particularly in the case where A calls B calls C calls D, and maybe even D calls A?

          • 2. Re: Remoting Transport Transaction Inflow Design Discussion
            marklittle

            And of course another option is to replace JacORB with another IIOP implementation.

            • 3. Re: Remoting Transport Transaction Inflow Design Discussion
              jhalliday

              re: ( 1 )  Whilst the remote UserTransaction model is easy to implement, it's tricky to document. It behaves in an intuitive fashion only for a very limited, albeit common, set of use cases. For more complex scenarios its inherent limitations manifest in ways that can be confusing to users. The cost of this solution must be evaluated not just in terms of implementation time, but in documentation and ongoing support cost.

               

              re: ( 2 ) JCA inflow was either designed for propagation to leaf nodes only, or incredibly badly thought out. Either way, the result is a model that simply isn't capable enough for a full distributed transaction solution. So, once again you're going to wind up with unintuitive behaviour and limitations for some use cases. For example, you can't register more than one resource with an inflowed transaction, you can't allow a context to be flowed into a node by more than one route and you need to register each subordinate node in the parent for recovery purposes. So, whilst this may provide a faster alternative to RMI/IIOP for simple use cases, it's not going to be an acceptable substitute for more advanced cases. Offering it alongside the traditional RMI/IIOP model is going to have implications beyond just the initial engineering, so the discussion may require input from docs, support and QE too. I'm pretty sure that e.g. support will tell you it is unacceptable to ship a solution that may require manual transaction cleanup. We've had a small number of corner cases in JTA that suffered that limitation and eliminating them and the support load they generate has been a high priority for the transaction development work. Intentionally introducing new ones is definitely in the category of Bad Ideas.

              • 4. Re: Remoting Transport Transaction Inflow Design Discussion
                dmlloyd

                Jonathan Halliday wrote:

                 

                re: ( 1 )  Whilst the remote UserTransaction model is easy to implement, it's tricky to document. It behaves in an intuitive fashion only for a very limited, albeit common, set of use cases. For more complex scenarios its inherent limitations manifest in ways that can be confusing to users. The cost of this solution must be evaluated not just in terms of implementation time, but in documentation and ongoing support cost.

                 

                re: ( 2 ) JCA inflow was either designed for propagation to leaf nodes only, or incredibly badly thought out. Either way, the result is a model that simply isn't capable enough for a full distributed transaction solution. So, once again you're going to wind up with unintuitive behaviour and limitations for some use cases. For example, you can't register more than one resource with an inflowed transaction, you can't allow a context to be flowed into a node by more than one route and you need to register each subordinate node in the parent for recovery purposes. So, whilst this may provide a faster alternative to RMI/IIOP for simple use cases, it's not going to be an acceptable substitute for more advanced cases. Offering it alongside the traditional RMI/IIOP model is going to have implications beyond just the initial engineering, so the discussion may require input from docs, support and QE too. I'm pretty sure that e.g. support will tell you it is unacceptable to ship a solution that may require manual transaction cleanup. We've had a small number of corner cases in JTA that suffered that limitation and eliminating them and the support load they generate has been a high priority for the transaction development work. Intentionally introducing new ones is definitely in the category of Bad Ideas.

                 

                Rather than flinging FUD at the only two possible approaches that are available to us, can you be more specific than "Bad Idea" or "some complex scenarios" or "badly thought out" or "not capable enough" or "unintuitive behavior" or "limitations" or "not acceptable substitute"?  This really is not productive.  Better still, explain specific issues in specific scenarios that apply to each approach, and why that issue is valid under the Remoting client/server architecture?

                 

                Also - only one resource for inflowed transactions?  How is that not a serious deficiency in our implementation?  You're basically saying that an MDB can never access more than one resource.  That's a major problem in and of itself.

                 

                Finally "unacceptable to ship a solution that may require manual transaction cleanup" - you should know that any two-phase transaction system may require manual transaction cleanup; that's the nature of two-phase transactions.  You'll have to be more specific about the circumstances in which it is not acceptable to require manual cleanup.  I'm pretty sure that if someone unplugs the ethernet cable of the transaction coordinator after prepare but before commit, there's going to have to be some manual cleanup.

                • 5. Re: Remoting Transport Transaction Inflow Design Discussion
                  marklittle

                  David Lloyd wrote:

                   

                  Finally "unacceptable to ship a solution that may require manual transaction cleanup" - you should know that any two-phase transaction system may require manual transaction cleanup; that's the nature of two-phase transactions.  You'll have to be more specific about the circumstances in which it is not acceptable to require manual cleanup.  I'm pretty sure that if someone unplugs the ethernet cable of the transaction coordinator after prepare but before commit, there's going to have to be some manual cleanup.

                   

                  So does this answer my question c) too? Are you saying that "any prepared but incomplete transactions have to be recovered manually" is only a reference to heuristic outcomes?

                  • 6. Re: Remoting Transport Transaction Inflow Design Discussion
                    dmlloyd

                    Mark Little wrote:

                     

                    Some initial thoughts/questions:

                     

                    a) what do you mean by "unaddressed issues - transaction timeout control"?

                     

                    I mean that I haven't addressed the issue of transaction timeout control.

                     

                     

                    Mark Little wrote:

                     

                    b) since this is distributed transactions (context propagation, commit protocol termination etc.) have you considered the scope of the testing that needs to be done in order to ensure this works in *all* cases?

                     

                    Keeping in mind that this is nowhere near the only process of this complexity to be tested - and no, don't trot out "it's more complex than you think" unless you want to enumerate specific cases (which will probably then be appropriated into additional tests) - I think we'd follow the same approach we'd follow for testing other things.  We'd unit test the protocol of course, and test to ensure that the implementation matches the specification, and verify that the protocol handlers on either "end" forward to the proper APIs.

                     

                    If you're asking "can we write automated tests to prove the validity of the approach", well no we can't; we're just coding against existing contracts, and if those are "wrong", well, that fix needs to happen in a different context.

                     

                     

                    Mark Little wrote:

                     

                    c) "any prepared but incomplete transactions have to be recovered manually"?

                     

                    That's just another way of saying we don't have any special, magical auto-recovery "stuff" that isn't provided by the transaction coordinator (which might well have some magical auto-recovery "stuff").  There might be a better way to express that.

                     

                     

                    Mark Little wrote:

                     

                    d) in case 2, where the client is within the same container as a tm instance, which tm is it trying to control/invoke? The remote one (in the other container instance) or the local one? Because we seem to have gone from controlling transactions (begin, commit, rollback) in the first scenario, to really doing distributed transactions in the second scenario. As you can imagine, these are very different situations.

                     

                    In case 1, the client has no TM and it uses a remote UserTransaction interface to directly control the remote TM.  In case 2, the client is using the local TM to control transactions, and is treating the remote TM as an enrolled resource into the current transaction.

                     

                    Case 1 cannot be made to work when a local TM is present without adding some notion in the EE layer to determine whether it should use the local UserTransaction or the remote one.  This is possible but is a possibly significant amount of work.

                     

                    Mark Little wrote:

                     

                    e) what about interoposition, particularly in the case where A calls B calls C calls D, and maybe even D calls A?

                    Theoretically each successive "step" will treat the TM of the subsequent "step" as a participating resource.  As to D calling A, that will only work if the TM is clever enough to figure out what's happening (I don't see why it wouldn't as the Xid should, well, identify the transaction so A should recognize its own; but that's why we're having this discussion).

                    • 7. Re: Remoting Transport Transaction Inflow Design Discussion
                      dmlloyd

                      Mark Little wrote:

                       

                      And of course another option is to replace JacORB with another IIOP implementation.

                       

                      Jonathan tells us this is prohibitive from a resource perspective.  In any case, I do not believe this would create a substantial enough improvement in performance, even if that were the only issue with going the IIOP-only route.

                      • 8. Re: Remoting Transport Transaction Inflow Design Discussion
                        dmlloyd

                        Mark Little wrote:

                         

                        David Lloyd wrote:

                         

                        Finally "unacceptable to ship a solution that may require manual transaction cleanup" - you should know that any two-phase transaction system may require manual transaction cleanup; that's the nature of two-phase transactions.  You'll have to be more specific about the circumstances in which it is not acceptable to require manual cleanup.  I'm pretty sure that if someone unplugs the ethernet cable of the transaction coordinator after prepare but before commit, there's going to have to be some manual cleanup.

                         

                        So does this answer my question c) too? Are you saying that "any prepared but incomplete transactions have to be recovered manually" is only a reference to heuristic outcomes?

                         

                        Let me frame it in terms of the SPIs we're dealing with.

                         

                        If a transaction is prepared (XATerminator.prepare()) and then the connection is lost, then the transaction is "stuck" in the prepared state - we aren't going to do anything about that at the protocol level.  But presumably the transaction coordinator would be able to automatically recover once the resource was made available again, as the transaction isn't "busted", it's just not fully completed.

                         

                        If the same scenario exists but the commit was received and acted upon (but it failed), then I guess this is what you refer to as a heuristic outcome, and, well we're not going to deal with that either; presumably our recovery tooling will work exactly the same way in this case as it does in the case where the same thing happens to your database connection.

                        • 9. Re: Remoting Transport Transaction Inflow Design Discussion
                          marklittle

                          David Lloyd wrote:

                           

                          Mark Little wrote:

                           

                          And of course another option is to replace JacORB with another IIOP implementation.

                           

                          Jonathan tells us this is prohibitive from a resource perspective.  In any case, I do not believe this would create a substantial enough improvement in performance, even if that were the only issue with going the IIOP-only route.

                           

                          I don't know about the resource perspective at this point, but I know it would be a lot less than writing another distributed transactions protocol ;-) I ported the C++ and Java transaction services to pretty much every C++ or Java ORB on the planet by 2005.

                           

                          As to performance? That really depends what we're trying to achieve and when. It's an option. Whether it's the right option in the long term and for all use cases, I don't think anyone can say precisely at this point.

                          • 10. Re: Remoting Transport Transaction Inflow Design Discussion
                            marklittle

                            "I mean that I haven't addressed the issue of transaction timeout control."

                             

                            What issues? The timeout is controlled by the coordinator, not the client. Or by "control" do you mean setTimeout calls?

                             

                            "Keeping in mind that this is nowhere near the only process of this complexity to be tested - and no, don't trot out "it's more complex than you think" unless you want to enumerate specific cases (which will probably then be appropriated into additional tests) - I think we'd follow the same approach we'd follow for testing other things.  We'd unit test the protocol of course, and test to ensure that the implementation matches the specification, and verify that the protocol handlers on either "end" forward to the proper APIs."

                             

                            Go take a look at the QA tests for JBossTS. You'll see that a sh*t load of them are covering recovery. And then take a look at XTS and REST-AT. You'll see that a sh*t load of them are covering recovery. Want to take a wild stab in the dark why that might be the case ;-)? Yes, it's complex. It's got to be fault tolerant, so we have to test all of the cases. There are no edge-cases with transactions: it either works or it fails. Unit tests aren't sufficient for this.

                             

                            "That's just another way of saying we don't have any special, magical auto-recovery "stuff" that isn't provided by the transaction coordinator (which might well have some magical auto-recovery "stuff").  There might be a better way to express that."

                             

                            Let me try and rephrase and let me know if I get it wrong: you assume that existing recovery approaches are sufficient for this and nothing new will need to be invented?

                             

                            "In case 1, the client has no TM and it uses a remote UserTransaction interface to directly control the remote TM.  In case 2, the client is using the local TM to control transactions, and is treating the remote TM as an enrolled resource into the current transaction."

                             

                            Yeah, so it's interposition. Like I said, these are two difference scenarios.

                             

                            "Case 1 cannot be made to work when a local TM is present without adding some notion in the EE layer to determine whether it should use the local UserTransaction or the remote one.  This is possible but is a possibly significant amount of work."

                             

                            How significant? If we're putting all options on the table then this needs to be there too.

                             

                            "Theoretically each successive "step" will treat the TM of the subsequent "step" as a participating resource.  As to D calling A, that will only work if the TM is clever enough to figure out what's happening (I don't see why it wouldn't as the Xid should, well, identify the transaction so A should recognize its own; but that's why we're having this discussion)."

                             

                            Please go take a look at what we have to do for interposition in JTS. And it's not because JTS is more complex than it needs to be: interposition is a fundamental concept within distributed transactions and the problems, optimisations, recovery semantics etc. are there no matter what object model or distribution approach you use. Take a look at XTS too, for instance.

                            • 11. Re: Remoting Transport Transaction Inflow Design Discussion
                              marklittle

                              "If a transaction is prepared (XATerminator.prepare()) and then the connection is lost, then the transaction is "stuck" in the prepared state - we aren't going to do anything about that at the protocol level.  But presumably the transaction coordinator would be able to automatically recover once the resource was made available again, as the transaction isn't "busted", it's just not fully completed."

                               

                              Well that really depends upon what the participant does or, even worse, what other participants do in the same transaction. For instance, it could decide to autonomously roll back, and in which case we're in heuristic land. So the transaction could well be "busted" in as much as we now risk not having an atomic outcome. Some error codes from XA allow for the transaction manager to periodically retry operations on the RM. A heuristic isn't one of them.

                               

                              "If the same scenario exists but the commit was received and acted upon (but it failed), then I guess this is what you refer to as a heuristic outcome, and, well we're not going to deal with that either; presumably our recovery tooling will work exactly the same way in this case as it does in the case where the same thing happens to your database connection."

                               

                              No, in the scenario you just mentioned the recovery subsystem would retry and either get back a "didn't you get my message" response (c.f. XTS) or eventually decide that the RM did commit and it missed the message. Remember that we're using presumed abort semantics here.

                              • 12. Re: Remoting Transport Transaction Inflow Design Discussion
                                marklittle

                                Could you outline the pros and cons of the current approaches we have in AS5/AS6? I know we've discussed them elsewhere already, but it would be good to capture it all here. For instance, why you believe that IIOP isn't right.

                                • 13. Re: Remoting Transport Transaction Inflow Design Discussion
                                  dmlloyd

                                  Mark Little wrote:

                                   

                                  David Lloyd wrote:

                                   

                                  Mark Little wrote:

                                   

                                  And of course another option is to replace JacORB with another IIOP implementation.

                                   

                                  Jonathan tells us this is prohibitive from a resource perspective.  In any case, I do not believe this would create a substantial enough improvement in performance, even if that were the only issue with going the IIOP-only route.

                                   

                                  I don't know about the resource perspective at this point, but I know it would be a lot less than writing another distributed transactions protocol ;-) I ported the C++ and Java transaction services to pretty much every C++ or Java ORB on the planet by 2005.

                                   

                                  As to performance? That really depends what we're trying to achieve and when. It's an option. Whether it's the right option in the long term and for all use cases, I don't think anyone can say precisely at this point.

                                  Okay, well we've proposed more than once to move off of JacORB.  I'd be in favor for sure.  But I think that's a peripheral topic at this point.

                                  • 14. Re: Remoting Transport Transaction Inflow Design Discussion
                                    marklittle

                                    David Lloyd wrote:

                                     

                                    Mark Little wrote:

                                     

                                    David Lloyd wrote:

                                     

                                    Mark Little wrote:

                                     

                                    And of course another option is to replace JacORB with another IIOP implementation.

                                     

                                    Jonathan tells us this is prohibitive from a resource perspective.  In any case, I do not believe this would create a substantial enough improvement in performance, even if that were the only issue with going the IIOP-only route.

                                     

                                    I don't know about the resource perspective at this point, but I know it would be a lot less than writing another distributed transactions protocol ;-) I ported the C++ and Java transaction services to pretty much every C++ or Java ORB on the planet by 2005.

                                     

                                    As to performance? That really depends what we're trying to achieve and when. It's an option. Whether it's the right option in the long term and for all use cases, I don't think anyone can say precisely at this point.

                                    Okay, well we've proposed more than once to move off of JacORB.  I'd be in favor for sure.  But I think that's a peripheral topic at this point.

                                     

                                    I'm not so sure it is. If moving off it to, say, the one in the JDK gives us better performance, then this could well be a case where "good enough" is "good enough", at least for AS 7.x. Longer term is obviously different. Anyway, no decisions at this point, but good to have all of the options available.

                                    1 2 3 Previous Next