4 Replies Latest reply on Sep 15, 2011 4:47 AM by adinn

    Feature Request: cross-process rendezvous

    ryanhos

      I wanted to vet my feature request here before I wasted someone's time with a JIRA.

       

      I'm still loving Byteman and evangelizing about it to everyone I know who has tough testing problems.  I'm even going to demo it to my local JUG.  One feature that I've been wanting for a while is cross-process rendezvous.  My scenario is this:  A servlet filter deployed on multiple jbossweb instances was using a "check then act" pattern to handle initialization of a database record based on the authenticated user (which could not be pre-initialized).  The bug appears when two JVMs both clear the check condition and then race to perform the act, which results in one JVM failing.  Needless to say, reproducing this by chance is problematic.  We have a workable solution, but we cannot prove that it works in an automated manner.  If I had cross-process rendezvous, Icould easily get both threads past the initial check, and then coordinate their progress through the act step, in order to assert that the interleaved calls result in the desired output and complete success.  Of course, we could set-up a unit test with two threads in a single JVM that could use byteman to do the same coordination, but if the solution relies upon concurrency controls that only work within a single JVM, the proposed solution will still fail in our 40 JVM cluster.  My motivation here is that I'd like to have someone who understands our cluster concurrency issues write the acceptance test, and then be able to assign anyone on the team to the bug, no matter their level of experience.

       

      The solution would need some easy method of communicating rendezvous arrivals and then having the coordinator (JVM that called createRendezvous()) release the threads from the blocking state.  UDP multicast immediately comes to mind because it's nearly zero-conf, but the lack of guaranteed delivery may leave some threads hung forever (though a simple main() could be written to send the "end rendezvous" message to release the stuck threads from a failed test).  TCP solves the guaranteed delivery problem, but there would have to be an agreed-upon configuration in order to get all processes talking together.

       

      So what does the forum think?  Is this as useful as I seem to think it is?  Or am I attempting to make more out of Byteman than was ever intended?

        • 1. Re: Feature Request: cross-process rendezvous
          adinn

          Hi Ryan,

           

          Nice to hear from you.

          Ryan Hochstetler wrote:

           

          So what does the forum think?  Is this as useful as I seem to think it is?  Or am I attempting to make more out of Byteman than was ever intended?

          First off, thanks for all the (deleted) positive words and I hope the JUG goes well.  As for your questions, I think it would be extremely useful and is just the sort of thing Byteman was intended to make possible. In fact, 2 years ago, when I first introduced Byteman to our QE team, extending the rendezvous built-in to be cross-process and even cross-host was immediately suggested as a possible extension to Byteman by Richard Achmatowicz (then in QE but now, unluckily for them but luckily for our dev team, an AS team member) for much the same  reasons as you outlined. The idea was put on the backburner and then lost because of lack of time to implement it and all the other desireable features and, it must be said, lack of a JIRA.

           

          So, please do raise a JIRA and I'll see whether I can find time to implement something or if someone else can do so. If I manage to implement something myself then I would prefer to add it first as a helper class/method in the samples library so people can try it out and provide feedback and corrections before it migrates into the default helper class.

          • 2. Re: Feature Request: cross-process rendezvous
            ryanhos

            I created BYTEMAN-170 for this feature.  I'm watching that issue and will be happy to test once there's something committed.

            • 3. Re: Feature Request: cross-process rendezvous
              nwhitehead

              I am interested in implementing this feature and I wanted to outline my initial thoughts.

              • Let's start with using JMX as the remoting transport.
                • It's built into the JVM
                • A remote connection can be defined in one string (the JMXServiceURL)
                • Servers can detect and track client connections.
                • Remote connections can track their connectivity to the server (and be notified of unexpected disconnects) It's simple to configure and byteman-sample already has the built-in capability to create JMXConnectionServers so remotes can connect.
                • In the event of firewall issues (etc.) it would also be possible to use JMX-WS over HTTP and/or tunneled JMXMP.
              • I thought about two different approaches which result in code changes to different parts of the code base so I am interested in feedback ( or outright criticism...):
                • Distributed Rendezvous
                  • Extend org.jboss.byteman.synchronization.Rendezvous as org.jboss.byteman.synchronization.DistributedRendezvous which  would incoporate a JMX MBean interface.
                  • Create a class org.jboss.byteman.synchronization.DistributedRendezvousController that will be registered in the local MBeanServer when activated (either by a rule or by a system property). The controller will be responsible for fielding remote requests for creation and deletion of  DistributedRendezvous instances.
                  • When a DistributedRendezvous is created, it will be registered as an MBean and expose the Rendezvous  attributes and operations. The ObjectName of the MBean will be a constant plus the stringified value of the DistributedRendezvous identifier.
                  • All the org.jboss.byteman.rule.helper.Helper *Rendezvous methods would have equivalent *DistributedRendezvous methods but with additional overloaded methods with an additional parameter for a String typed JMXServiceURL.  If a JMXServiceURL is supplied, the helper's invocation will be invoked against the named (identified) DistributedRendezvous MBean in the remote MBeanServer.
                  • If a org.jboss.byteman.rule.helper.Helper *DistributedRendezvous method is invoked with a null JMXServiceURL then the MBean will be registered locally and if a JMXConnectionServer has not been started yet, one will be started.
                  • When a remote thread rendesvous with a DistributedRendezvous,  we can track the JMX Client ID and detect if/when the client disconnects in which case ......  (need to think about this. maybe nothing.)
                  • When a local thread rendesvous with a remote DistributedRendezvous,  we can track the JMXConnection and if/when it disconnects before the count is complete, we can throw an exception and release the thread. (and/or something else ?)
                • Helper Only
                  • Once I thought about the first option, it occured to me that we can ditch the  DistributedRendezvous and simply hide the whole implementation behind the org.jboss.byteman.rule.helper.Helper *DistributedRendezvous overloaded methods.
                  • The org.jboss.byteman.synchronization.DistributedRendezvousController would be the same, except that it would also expose all of the Rendezvous  attributes and operations, overloaded withan identity parameter.

               

              Either way, I feel the following would be applicable:

              • Implement a org.jboss.byteman.synchronization.IRendezvous  interface that would serve to create JMX proxies.  (Simple enough)
              • Distributed Rendezvous would require (at least) the conversion of the idnetifier from an Object to a java.io.Serializable to ensure that it can be serialized for remote rendezvous operations.
              • For the DistributedRendezvous  path, a programmer caution here would also be that the stringification of the identifier result in a broadly unique and deterministic value.

               

              It's so easy, I feel like it's already done 

               

              Cheers.

               

              //Nicholas

              • 4. Re: Feature Request: cross-process rendezvous
                adinn

                Hi Nicholas,

                 

                Sorry for not replying sooner. I have had my head stuck in a JVM for the last week or so.

                 

                Your design sounds perfectly feasible and has many virtues, not the least of which is reusing JMX code to do a lot of the heavy lifting. It has one limitation from the Byteman user'spoint of view which is that the various JVMs involved in the rendezvous operate asymmetrically i.e. the master JVM needs to employ different rules (or maybe, with enough care in the design of the helper, just a different runtime environment configuration) to the slave JVMs. Of course, that's not much of a criticism. Declaring one JVM mastter and requiring all the other JVMs to negotiate with it has the virtue of removing most of the race conditions which would plague a decentralized solution (the only race to resolve is the one at startup where a slave has to wait for the master before proceeding) so expecting symmettrical operation is a tall order.

                 

                A second point worth noting is that your design may conflict with the use of JMX by the application. However, it could probably quite easily be re-implemented using some other mechanism than JMX (a master socket listener and client sockets on the slave, say) without (significanlty) affecting the helper API. So, this gives us a quick and easy implementation and we could eventually develop an alternative implementation for anyone who does not want to use JMX.

                 

                I agree that you should make all this behaviour available by extending Helper with a DistributedHelper subclass which redefines the relevant methods. We might then be able to implement some of the other methods so they work distributed. e.g. flag, clear and flagged? countDown? etc.

                 

                If you manage to implement this design or find issues which are not addressed  please report back on progress. I'll be very happy to accept contributed code, test it and add it to the Byteman release.

                 

                regards,

                 

                 

                Andrew Dinn