Modular Serialization

Version 18

Created by jason.greene on Sep 27, 2011 12:50 PM. Last modified by jason.greene on Sep 27, 2011 5:09 PM.

Understanding Class-loading Issues

When an object is serialized using Java serialization, the format consists of a "class reference" and the field data of every field in all inherited classes. The class reference is simply a class name. The assumption is that the receiving party has all of the sender's class definitions readily accessible to the thread's current context class loader. This means very different semantics than those commonly expected in a normal in-vm invocation across class loader boundaries. In a modularized environment, it is not only common, but considered good practice to only share public API types and yet still pass internal implementation classes that back the API types across module boundaries. These "opaque" structures don't cause the VM a problem because the same physical, already constructed, object instance is being passed around.

However, with serialization there is no object instance. All fields of all types that make up an object need to be accessed to recreate the instance. Also since fields themselves may be references to other custom types, it is quite common for a large graph referencing numerous types to be on the wire, and thus have a need for visibility.

Problem A - Subclass Visibility

In this example a common super class is shared between a sender and a receiver, but an extended subclass used by the sender's implementation is mistakenly not shared. This case works just fine with local IN-VM invocations, but will fail once serialization is involved.

Problem B - Reference / Aggregation Visibility

In this example a common class is shared between a sender and a receiver, but it contains a field which references a sender class that is mistakenly not shared. This case works just fine with local IN-VM invocations, but will fail once serialization is involved.

Problem C - Transitive Modular Dependencies

This complex example shows how the isolation properties of a module can reflect its ability to load its own serialized data. It's considered good practice (and actually the default in JBoss Modules, as well as other module systems) to not import a transitive dependency. This allows for an application to use independent versions of a module that will never conflict with a library used by one of the application's dependencies. In this example we have an application that is using a framework of some sort which controls the lifecycle of a class in the application (think IoC container, ejb, web container etc). As part of constructing the instance of the application's class Foo, it associates some internal class with the instance that it uses from a thirdparty framework that is of no interest to the application (it's just an implementation detail of the framework).

The application then serializes the instance the framework constructed BUT it will not be able to deserialize it. This is because the framework's dependencies (which are transitive to the application) are not visible to the application.

Solution 1 - Require Components To Share All Needed Classes

This may require altering the application / component code to always use common shared classes in the serialization stream. A negative side effect of this is that it puts a larger burden on the application / component developer. A positive benefit is that the wire format is compatible standard serialization.

Possible approaches to use this solution with the above problems would be:

Problem A:

Create a FooData class in shared.jar.
Add a FooImpl.toFooData().
Write the FooData instance.

Problem B:

Move Bar to shared.jar

Problem C (Option 1)

Create a copy constructor on Foo
Create a local data version of whatever interface Bar implements
Have the copy constructor copy Bar's interface properties to the data version

Problem C (Option 2)

Import thirdparty.jar, or change framework.jar to export thirdparty.jar

Solution 2 - JBoss Marshalling - Modular Serialization

This solution is currently available using the ModularClassResolver like so:

RiverMarshallerFactory factory = new RiverMarshallerFactory();
MarshallingConfiguration configuration = new MarshallingConfiguration();

// Enable Modular Serialization!
configuration.setClassResolver(ModularClassResolver.getInstance(moduleLoader));

// Create a marshaller on some stream we have
Marshaller marshaller = factory.createMarshaller(configuration);
marshaller.start(new OutputStreamByteOutput(fileOutputStream));

// Write lots of stuff
marshaller.writeObject(fooObject);

// Done
marshaller.finish();

This changes the format of the stream such that the owning module identifier is written in addition to the name of any class referenced in the stream. This allows jboss marshalling to easily locate the "correct" class loader for deserializing the class. The advantage of this approach is that application/component code need not be changed, and that the same practices used in local invocation also apply with remote serialization. The drawback is that the serialization format must be altered potentially introducing compatibility problems with older clients. In addition non-modular clients, if they need to be supported, will not understand the format. Solution 3 is the recommended approach when compatibility is a concern.

Note that JBoss Marshalling has some other benefits including double the performance of Java serialization. For general information on JBoss Marshalling check out its project page.

Also note this solution is only applicable to JBoss Modules. Although, you could add your own resolver that works with the modular framework of your choice (OSGi for example).

Solution 3 - Support both Modular and Standard Serialization

This is the same approach as 2 but adds the ability to handle both styles of streaming. This is achieved by either having the client ignore the information, or having some kind of negotiation process (protocol version etc).

JBossDeveloper