5 Replies Latest reply on Mar 18, 2010 2:51 PM by dmlloyd

    Problems with proxy classes and a proposed resolution

    adinn

      This discussion is prompted in part by Mark Stuberg's note and in part by looking at JASSIST-42 which identifies various problems with the current use of writeRepace to support proxy serialization. The comments on JASSIST-42 contain a prescription for how to solve the serialization problems provided by David Lloyd. I impemented tis solution and, unfortunately, although it almost works is still suffers from a couple of serious problems. Although it might be possible to get round these the solution requires javassist clients to do a lot of extra work and is likely to be error-prone.

       

      I have identifed an alternative solution which should also ensure that cached proxy classes can be reused much more often both when performing serialization and in normal use of proxies. The only drawback is a very small change to the proxy factory/class API which will require existinng client code to be modified very slightly. The proposed fix can be used with the writeReplace style serialization solution to ensure that deserialized proxies can share the same cached class and do not fail to find the correct class. It can also be used to support serialization/eserialization using David Lloyd's proposal which provides much better quality serialiation (e.g. superclass fields are not lost) but is a tiny bit harder for clients to use. I have nearly finished a prototype which enables both alternatives to be made available so, modulo the small factory API change, we can retain compatibility for code which currently uses writeReplace based serialization.

       

      There are two related  problems with the current proxy cache. The first is to do with identity, the second with naming.. The identity of a cache entry (a proxy class) associated wiht some given classloader is determined from four sets of data:

      • the super class s
      • the interface set i1,...,ik
      • the proxy filter f
      • the proxy handler h

      Following the fix for JASSIST-104 the per-classloader cache is a hashmap keyed by a string derived from the names of the classes in the first two data sets: i.e. of the form sn$i1n$...ikn$. where sn is the (String) name of s, i1n is the name of i1 etc All proxy classes with a given super and interface set are stored in a list under this key. The list contains entries each of which stores a filter f1, handler h1 and weak reference to the associated class pc1. So, when creating a class with the specified data set s, i1...,ik, f, h the key is constructed and the relevant list located and traversed looking for an entry where f1 == f and h1 == h.

       

      This already wastes an opportunity for reuse of proxy classes. If f1 and f2 are two filters which behave exactly the same then a class created with filter f1 will not be reused when creating a class withh filter f2 even if all the other elements of the iidentifying set are the same.

       

      The problem with serialization is more severe. The current scheme based on use of method writeReplace serializes a proxy object by substituting an instance of SerializedProxy containing the identifying details sn, i1n..,ikn, f and h. At the deserializing end this results in objects sn', i1n'...,ikn', f' anf h'. The deserialized instance of SerializedProxy uses these details to lookup or craeate a proxy class pc' and create an instance p' which it then initialises with the correct handler. Unfortunately, this also loses all data locate in instance fields iniherited from superclass s.

       

      Even if the deserialize is done in the same JVM as the serialize operation f' and h' will not equal f and h so a cache lookup will fail to find the originating class pc from which the proxy was serialized. If it is done in a different JVM the same problem arises. The instances will be new every time they are read from the stream so they can never match cache entries. So, each time  that, say, a remoting call is made serializing an instance of proxy class pc down the wire will cause a new proxy class pc' to be created and the cache soon blows up.

       

      The same problem affects the scheme suggested by David Lloyd which avoids using writeReplace. Instead the serialize and deserialize operations must be done using a subclass of ByteOutputStream and ByteInputStream defined by the javassist package. When a ProxyObjectOutputStream detects that it is serializing a proxy instance p the stream identifies the proxy's class pc, and writes sn, i1,... etc into the stream before using the normal serialization code to serializethe field values stored in p.  At the other end the ProxyObjectOutputStream recognises that details of a proxy class have been included. It reads sn', i1n' etc and tries to lookup and retturn a cached proxy class pc' identified by these values.. The parent code in ObjectInpuStream creates an instance p' from this class and initialises its fields using the data serialized from p. Once again this can never reuse a cached class and will always end up creating a new class pc' since f' and h' are new objects so will never identify a cache entry

       

      The second problem is to do with how proxy classes are named. Clearly, there can be more than one class with super s and interface list i1,..,ik. So, the current code creates a name of the form javassist.proxy$sn$i1n$...$ikn_37 whenever it creates a proxy class. The number at the end is derived by incrementing a static counter in class ProxyFactory. So, when an instance p of class pc is serialized and then deserialized the new class created at the other end may have a name like javassist.proxy$sn$i1n$...$ikn_26 i.e. with a different numeric tag at the end. It is very unlikely that class names will actually match up. This change of classname is not fatal for the writeReplace based serialization scheme. However, if kills david Lloyd's alternative. The class pc' returned by the ProxyObjectOutputStream must have the same name as the original class pc. If the names don't match the deserialize throws an exception.

       

      One way to fix this would be to require filter and handler classes to implement equality and naming. If the classes fc and hc of filters f and h implemented method equals() then the cache lookup could succeed if f'.equals(f) and h'.equals(h). The client would have to ensure that for any two filters f and f' f.equals(f') if and only if any method selected by f would be selected by f' and any method rejected by f would also be rejected by f'. This is feasible but probably somewhat error prone. Handler classes would also need to implement a coherent equals method.

       

      It would also be necessary to implement a matching getName() method for each filter or handler class which returned a String to be  included in the proxy class name. So, if f.getName() ==> fn, h.getName() ==> hn then the proxy class would have name javassist.proxy$sn$i1n$...$ikn$fn$hn. These methods woudl need to esnure that where f1.equals(f2) then f1.getName.equals(f2.getName()) and similarly for h1, h2. This would ensure that both serialization schemes worked and were able to reuse cache entries albeit at some cost in code and brainpower to clients.

       

      Rather than impose this burden on the programmer I suggest a simpler solution. First, I'll make a couple of important observations.

       

      For a given pair of proxy classes p1 and p2 derived form clases pc1 and pc2 which differ in their handlers h1, h2 but share the same super s, interfaces i1,...ik and filter f the classes pc1 and pc2 are almost identical in behaviour. Since they share the same filter pc1 and pc2 will call the instance's handler in exactly the same cases and will both rely on the methods inherited from s in all other cases.The only point where the classes idffer is that the static handler field in pc1 and pc2 may differ and this affects the initial value of the handler assigned to their innstances So, if p1.handler and p2.handler ar expiicitly  set by the client p2 could be derived from pc1 and we could dispense with pc2. If we remove this static default handler field and require the cleint to always call setHandler on a new instance then we can remove any dependence of the proxy class on the handler value. This means we can remove h from the set of identifying data used to ientify a proxy class..

       

      So, to achieve this we need to decommission Proxyfactory.setHandler(), remove the static fields in the handler class and change the generator for the proxy lass constructors so that the instance's handler is initialised using the global defaul in ProxyFactory rather than looking for the proxy class default.

       

      The second observation is that the identity of two filter instances f1 and f2 can be ascertained automatically and can also be encoded automatically into a unique name. The list of methods m1,...,mn impllemented by s, i1,...ik is determinate and of fixed length. So, we can encode a bit vector of length n using a 1 at position i if f1 redirects calls to mi and a 0 if f1 relies on the impllementation inherited from s. If f2 encodes the same bit vector it will generate essentially the same class. We can also translate this bit vector to a filter signaure key f1n, a hex String which will looksomething lik, say, fa08. Any equivalent filter method f2 will also have the same signature key.

       

      This solves our identity and naming problem. If we store the bit vector in the class as an int[] and construct a name for the proxy class wich includes this filter name -- something of the form javassist.proxy.$sn$i1n$...$ikn$fn -- then we can guarantee to be able to identify and reuse equivalent proxy classes both under a call to createClass and also during deserialization.

       

      I would prefer if we were to drop method

       

      ProxyFactory.setFilter()

       

      and instead require clients to call

       

      ProxyFactory.createClass(MethodFilter).

       

      If instead the client calls

       

      ProxyFactory.createClass().

       

      then we generate a class which redirects all methods via the handler. Alternatively, we can retain teh setFilter method on the factory and use it to filter when createClass is called.

       

      So, if we swallow these small API changes then we can fix the current proxy model and we can also implement David LLoyd's alternative serialization model. Note that we don't want to store the filter instance in the class as this may cause classloader  issues when we reuse a class whose filter belongs to a classloader which is not known at the deserializing end . However, storing an int[] or the String filter signature will cuase no such problems.

       

      For the serialization fix I believe we can retain compatibility by providing a proxy factpry settting. By default a factory will create classes which implement writeReplace and can be serialized via normal ObjectOutput/InputStreams. We can provide a method

       

      ProxyFactory.setUseWriteReplace(boolean) which can switch off generation of this method when rerating new proxy classes. Instances of these class must be serialized using a ProxyObjectOutputStream and deserialized using a ProxyObjectInputStream. However, this will enable them to be fully serialized and reconstituted, avoidning the problems David mentions in JASSIST-42.

       

      I am very close to having a prototype implementation of tis behaviour (I got most of the way there identifying the two critical failings). I'll complete this and post a patch. However, I;'m interested to hear feedback form anyone who may be affected by the API changes here.

       

      regards,

       

       

      Andrew Dinn

        • 1. Re: Problems with proxy classes and a proposed resolution
          dmlloyd

          Excellent analysis, Andrew.  The issue of differing default filters and handlers is something I was a little worried about.  I think in hindsight that might have been a design flaw.  Using an identifier seems like a good workaround to me.

           

          In terms of differing class names - my intent when I suggested subclassing OOS/OIS was that the read/writeClassDescriptor methods would be overridden such that javassist proxies are treated specially (similar to how reflection proxies are currently treated specially).  With reflection proxies, the implemented interface names are written; with Javassist, this information, plus the base class descriptor, plus the identifiers for the default filter and handler would all be part of the descriptor.

           

          Once the proxy class is mapped, serializing the proxy instance should work normally at that point.  The only thing I'm not sure of off the top of my head is whether there's a way to get an ObjectStreamClass descriptor instance for the Javassist proxy when reading that descriptor.  I seem to recall that this API is particularly moronic.

          • 2. Re: Problems with proxy classes and a proposed resolution
            dmlloyd
            Incidentally, it is possible that a writeReplace serialized proxy object could still be a viable solution.  It would just have to be able to encapsulate the entire object state (basically reproducing the serialization spec behavior of OOS and OIS in terms of creating instances via the proprietary Sun API, initializing fields, dealing with read/writeObject, etc.).
            • 3. Re: Problems with proxy classes and a proposed resolution
              adinn

              Thanks for the feedback, David.

               

              my intent when I suggested subclassing OOS/OIS was that the read/writeClassDescriptor methods would be overridden such that javassist proxies are treated specially (similar to how reflection proxies are currently treated specially).

               

              That's precisely what my prototype implementation does (your comments in JASSIST-42 were all the recipe I needed). It seemed to work ok so once I fixed writeClassDescriptor to return a class with the correct name. This allowed the writer and reader to use normal object serialization to write/read the proxy instance itself.

               

               

              Incidentally, it is possible that a writeReplace serialized proxy object could still be a viable solution.  It would just have to be able to encapsulate the entire object state (basically reproducing the serialization spec behavior of OOS and OIS in terms of creating instances via the proprietary Sun API, initializing fields, dealing with read/writeObject, etc.).

               

              Well, yes that's maybe so but I would not like to have to write and maintain code to do that. Your subclassed OOS/OIS solution achieves the same result with just a few calls to methods of OOS/OIS. That''s a big plus for your solution. My reasons for retaining the writeReplace version were i) retaining compatibility and ii) providing an option for code which has to use an OOS/OIS created by some other package rather than create its own subclass of OOS.This discussion is a way of polling to see how important these criteria are.

               

              regards,

               

               

              Andrew Dinn

              • 4. Re: Problems with proxy classes and a proposed resolution
                adinn

                Here are the diffs for my modified version of the proxy factory code against the current trunk.

                 

                The cache works without leaks and supports serialization. By default classes are created to use the old writeReplace implementation which now reuses proxy classes but still fails to serialize inherited fields.

                 

                However,  if you disable generation of the  writeReplace method then you can serialize and deserialize the full object state so long as you do it using a ProxyObjectOutputStream and ProxyObjectInputStream. The static field ProxyFactory.useWriteReplace represents the default setting for new factories. It is initialized to true to retain backwards compatiility. you can reset this default or else you can configure the behaviour of each new factory by calling factory.setuseWriteReplace(boolean).

                 

                Also, I have modified static field ProxyFactory.useCache along the same lines. It defines the default behaviour for new factories. So, iIf you reset this field to false and create a factory then it will not install its classes in the cache. You can also oevrride the setting per factory by calling factory.setUseCache(boolean).

                 

                See the javadoc in ProxyFactory, ProxyObjectOutputStream and ProxyObjectInputStream for full details.

                 

                n.b. I managed to avoid having to change the naming scheme for generated classes. I did have to introduce a method filter signature to the iidentity criteria for a class. This is, a byte[] which enncodes the behaviour of a method filter instance.

                • 5. Re: Problems with proxy classes and a proposed resolution
                  dmlloyd
                  Nice!!