1 Reply Latest reply on Jul 3, 2012 5:07 PM by susanin

    Matching and replacing a sequence of bytecodes/instructions

    susanin

      Hi,

       

      I have the following problem to solve:

       

      I need to detect assignments to a specific field, where the right-hand side is also of a specific form, e.g. it is the same field (or the field of the same specific type) of another object:

      obj1.fieldX = obj2.fieldX

       

      The reason for looking for such a pattern is an (peephole) optimization of the code that I generate using Javassist. Currently, I have my basic transformation working (using javassist.expr.FieldAccess), but it treats such assignments as two independent operations:

      read from fieldX of obj2

      write into fieldX of obj1

      and I cannot see that they are used in combination and cannot optimize such a typical use-case. As  a result, the generated code is rather inefficient at run-time and is about 2-3 times slower, because both transformations done independently result in prodicing a lot of heap-allocated intermediary objects of complex types. Those objects are generated as a result of a read and some of their fields are initialized from obj2.fieldX. Then write does the opposite action, i.e. copies some of those fields of a newly allocated object into a target object, i.e. into ob1.fieldX. And then the intermediary object is not used any more. My wish is to directly copy some information from obj2.fieldX into obj1.fieldX, without generating all those useless intermediary objects on heap.

       

      Therefore I have questions:

      1) Is it in principle possible using Javassist to detect something like this and to replace the whole expression by a new code or transform it?

      I've seen that BCEL has something like this for matching sequences of byte-codes - it has code matching using regular expressions, like "NOP+(ILOAD|ALOAD)*"  (http://commons.apache.org/bcel/manual.html, section 3.3.7). May be Javassist also has it in a certain form?

       

      2) May be FieldAccess allows access to additional information using $_, $0 or $1, so that one can try to match a pattern against a the target (i.e. $_) of read or value argument of a write and perform a dedicated action if it is detected?

       

      3) May be there are other ways rather than code matching at the bytecode level to achieve my goal?

       

      Thanks,

        Leo

        • 1. Re: Matching and replacing a sequence of bytecodes/instructions
          susanin

          Update:

           

          I've ported BCEL's instruction matching feature (InstructionFinder class) to Javassist in the meantime. The matching part works perfectly and is really nice. You can build regular expressions over bytecodes, i.e. it is like usual regex for String, but your letters are bytecode instructions.

          For example you can say "NOP+(ILOAD|ALOAD)*" or "GETFIELD PUTFIELD". There are also short-cuts for common groups of instructions

           

          At the moment I'm working on the replacement part. I'm trying to follow the ExprEditor approach and provide a similar API for instruction matching and replacement. It is sort of working already for simple use-cases, but still needs more work. Some of the problems that I face are:

          1. A match can span multiple bytecode instructions. Should each instruction be accessible via a special $... syntax? If so, which one. May be $#i, which would refer to the i-th instruction in the matched fragment.
          2. Should it be possible to replace all of the instructions or just some of them by e.g. using a special syntax like $#i = replacement_expression
          3. In ExprEditor and related classes, the replacement expression is using a Java syntax, which is very nice. But since we want to replace bytecode instructions here, we may need to be able to use bytecode instructions in the replacement expression string, because Java expressions are too high-level to express some aspects at bytecode level (e.g. pushing/poping values from stack). I can see scenarios, where one would only want to use bytecode instructions in the replecement, where one would want to use a mix of java code and bytecode instructions and where only Java code is used. Among other things, these questions are interesting here :
            • which easy to use syntax should be used for bytecode expressions?
            • If and how it can be mixed with Java?

           

          Any comments and opinions regarding mentioned problems is very welcome!

           

          And please let me know if there is any interest in this functionality. If so, I could create a JIRA issue and attach my current patches there as a start.