Common marshalling infrastructure| JBoss.org Content Archive (Read Only)

15. Re: Common marshalling infrastructure

dmlloyd Aug 6, 2008 11:43 PM (in response to dmlloyd)

"scott.stark@jboss.org" wrote:
So where does the invocation target class loader fit in, I don't see it in the apis.

It's factored out of the API currently - you'd have a customized ClassUnmarshaller that looks up the class in the appropriate classloader, similarly to how standard serialization works. The reason is that I can't really think of a good general solution (the needs of JBC & JBREM are very different in this area for example).

"scott.stark@jboss.org" wrote:
I have been talking to Ron about some current remoting class loading issues that arise from the invocation handler being the only one who knows the correct class loader for unmarshalling application specific classes:

1. thread pool receives a remoting request.
2. unmarshall just enough to understand the request, but don't unmarshall any invocation payload. Application specific data needs to be isolated outside of the remoting control structures to allow this to happen.
3. dispatch the request to a handler.
4. handler sets the TCL
5. handler or its delegate unmarshalls application payload
6. application does what is does.
7. handler serializes application return/exception and then unsets the TCL
8. remoting layer completes request control information

This particular problem is specific to Remoting, so if you don't mind I'm going to move this bit back over to those forums before I reply...

16. Re: Common marshalling infrastructure

starksm64 Aug 7, 2008 2:57 AM (in response to dmlloyd)

It would seem to me that I can have the same cross domain marshalling problem in JBC itself as well though. If I have two applications sharing a cache across web apps in the same server, with different class loading domains for the shared data, it would have to be marshalled/unmarshalled in each web apps class loader space. In general marshalling/unmarshalling is going to need the ability to establish a class loader.

17. Re: Common marshalling infrastructure

manik Aug 7, 2008 7:04 AM (in response to dmlloyd)

Scott is right, this does apply to JBC as well. Currently what I do is I have a concept of cache regions, and users register class loaders per region. Any calls that are then sent over the wire contain the following:

[version id: short][region id: serialized Fqn][payload]

my unmarshalling code currently:

1. reads the version id and delegates to the appropriate unmarshaller
2. unmarshaller reads the region id and sets the TCL appropriate to that region
3. continues unmarshalling payload, etc

Pretty much like what Scott said.

So looking at your APIs, I would need to be able to do something like:

marshallerFactory.setStreamHeader(new VersionHeader(versionId));
marshallerFactory.setStreamHeader(new RegionHeader(myRegion));
Marshaller m = marshallerFactory.createMarshalleroutput();

and when unmarshalling, StreamHeaders should also be able to react to headers being present such that:

marshallerFactory.setStreamHeader(new VersionUnmarshallHeader());
marshallerFactory.setStreamHeader(new RegionUnmarshallHeader());
Unmarshaller m = marshallerFactory.createUnmarshallerInput();

such that VersionUnmarshallHeader can read the version short and set the appropriate ObjectMarshallerFactory and ClassMarshallerFactory pertaining to the version of the protocol, and the RegionUnmarshallHeader would read the Region Fqn and set the TCL as needed.

Perhaps the same StreamHeader impl could be used for both purposes - makes logical sense.

18. Re: Common marshalling infrastructure

dmlloyd Aug 7, 2008 9:45 AM (in response to dmlloyd)

"manik.surtani@jboss.com" wrote:
Scott is right, this does apply to JBC as well. Currently what I do is I have a concept of cache regions, and users register class loaders per region. Any calls that are then sent over the wire contain the following:

[version id: short][region id: serialized Fqn][payload]

my unmarshalling code currently:

1. reads the version id and delegates to the appropriate unmarshaller
2. unmarshaller reads the region id and sets the TCL appropriate to that region
3. continues unmarshalling payload, etc

I would continue to do it exactly this way personally. The stream header mechanism should not be used to solve this problem (as designed today) - its purpose is just for verifying e.g. the stream's magic number (if there is one) and version number (if any).

Assuming your region ID is a string, you can just use DataInput.readUTF to read it in (and DataOutput.writeUTF to write it out); should be fairly straightforward.

19. Re: Common marshalling infrastructure

dmlloyd Sep 16, 2008 9:37 AM (in response to dmlloyd)

OK so I thought I'd post an update. The basic framework is "Almost Done (tm)", and so I just want to quickly mention some things that ended up different in the implementation from what was mentioned previously here.

The ability to create object instances is now pluggable via the Creator interface. I provide two implementations: one that just uses reflection, and one that uses the Sun-specific method to create a constructor if a suitable one does not exist.

Object and Class marshallers are no longer separated. This looked good on paper but turned out to be totally useless in practice. Instead, there is a notion of ClassResolvers (this is where you plug in your customized classloader configuration), ClassTables (predefined classes by ID), and ObjectTables (predefined objects by ID).

Externalizers now can create object instances, so it's possible to serialize and deserialize an entire payload without using reflection at all.

Finally, I've created some abstract base classes to make it easier to create a marshaller implementation. Right now we're working on a full-featured "native" implementation, as well as a Java-compatible implementation. Clebert had mentioned that he should be able to produce a JBSER-compatible implementation as well.

Here's the updated API: http://tinyurl.com/6yme33

20. Re: Common marshalling infrastructure

manik Sep 18, 2008 8:56 AM (in response to dmlloyd)

Looks good. Any preliminary numbers on how an implementation based on this would perform when compared to directly using JDK serialization with magic numbers, etc. for class defs?

21. Re: Common marshalling infrastructure

dmlloyd Sep 24, 2008 9:26 AM (in response to dmlloyd)

Well, I'm still working on the finishing bits and unit tests, so for now I'll just say, "42".

22. Re: Common marshalling infrastructure

dmlloyd Sep 29, 2008 5:33 PM (in response to dmlloyd)

OK, so my preliminary testing & tuning is showing that for large batches of Serializable or Externalizable objects, my default implementation is around twice as fast as the standard one (this is without using class or instance tables; though that would speed things up a little, large batches will benefit less from class tables than small batches will). I've only tested with Sun 1.5 and 1.6, 32- and 64-bit JVMs on Linux x86_84 though.

I've yet to do a real test with many small batches. When I've gotten all this testing done I'll do up some pretty charts or something. More news as it comes in...

23. Re: Common marshalling infrastructure

manik Sep 30, 2008 6:25 AM (in response to dmlloyd)

And this doesn't even take into account the stream pooling or anything. cool!

24. Re: Common marshalling infrastructure

galder.zamarreno Mar 19, 2009 4:31 PM (in response to dmlloyd)

Hey David, I've started to look into how to integrate JBoss Marshalling into JBoss Cache. Here's a list of things I wanted to ask you about:

1.- Would it be possible to upload source jars to the maven repo for next release?

2.- Looking at JBoss Marshalling, I saw that you're considering primitive arrays as known classes but primitive object arrays are not. Is this an oversight or on purpose? i.e.

map.put(Byte[].class, Protocol.ID_BYTE_ARRAY_CLASS);
map.put(Boolean[].class, Protocol.ID_BOOLEAN_ARRAY_CLASS);
map.put(Character[].class, Protocol.ID_CHAR_ARRAY_CLASS);
...

3.- Looks like collections like ArrayList, LinkedList, Hashmap, TreeMap...etc are treated as Serializable objects whereas JBoss Cache treats them differently. Instead, we look through the collection and see if it's an object we might want to marshall it in a diferent way to standard Serialization, i.e. a list of ReplicableCommand (ReplicableCommand does not implement Serializable). How would we deal with this?

4.- Also, JBC treats SingletonList instances separately by just copying the single object within them. Any plans to add this to JBoss Marshalling as well?

Other than this, JBoss Marshalling seems to contain the rest of optimisation JBC did for known type arrays, repeated objects, null values...etc. So, in spite of agreeing with Jason's statement in https://jira.jboss.org/jira/browse/JBCACHE-1336 earlier today:

It might be more useful to define type marshallers outside of the type since most of our magic numbers apply to types not under our control (JDK types).

I don't think this might apply any longer if the above points can be resolved. IOW, all types that are not under our control would already be handled by JBoss Marshalling which means that we can concentrate on our types and we could then use @Marshallable annotations.

FAO Manik: We talked earlier about the possibility of externalising magic numbers to a properties or XML but if we can stick to annotations, I think it would be cleaner and more natural. We could also make, @Marshallable inherited (http://java.sun.com/j2se/1.5.0/docs/api/java/lang/annotation/Inherited.html), then annotated subclasses would be covered and you'd be able to override the magic number if you needed different marshalling.

Thoughts?

25. Re: Common marshalling infrastructure

manik Mar 20, 2009 8:22 AM (in response to dmlloyd)

"galder.zamarreno@jboss.com" wrote:

So, in spite of agreeing with Jason's statement in https://jira.jboss.org/jira/browse/JBCACHE-1336 earlier today:

It might be more useful to define type marshallers outside of the type since most of our magic numbers apply to types not under our control (JDK types).

I don't think this might apply any longer if the above points can be resolved. IOW, all types that are not under our control would already be handled by JBoss Marshalling which means that we can concentrate on our types and we could then use @Marshallable annotations.

Not true. Outside of our control != JDK classes. E.g., a JGroups IpAddress is something we marshall. We can't annotate these and JBoss Marshalling certainly doesn't know about this. :-)

While I agree that using annotations is more natural, I sadly think that this is inadequate. Perhaps what we could do is to use annotations for classes under our control, and then supplement with an XML based magic-map for classes outside of our control. Adds unnecessary complexity, but I can see how this does make the code easier, more readable.

We would definitely need a unit test though to ensure we don't have colliding magic numbers. :-)

26. Re: Common marshalling infrastructure

dmlloyd Mar 23, 2009 4:26 PM (in response to dmlloyd)

"galder.zamarreno@jboss.com" wrote:
Hey David, I've started to look into how to integrate JBoss Marshalling into JBoss Cache. Here's a list of things I wanted to ask you about:

1.- Would it be possible to upload source jars to the maven repo for next release?

Will do. If I don't, feel free to yell at me. :-)

"galder.zamarreno@jboss.com" wrote:
2.- Looking at JBoss Marshalling, I saw that you're considering primitive arrays as known classes but primitive object arrays are not. Is this an oversight or on purpose? i.e.
map.put(Byte[].class, Protocol.ID_BYTE_ARRAY_CLASS);
map.put(Boolean[].class, Protocol.ID_BOOLEAN_ARRAY_CLASS);
map.put(Character[].class, Protocol.ID_CHAR_ARRAY_CLASS);
...

The reason for this is that primitive array classes do not extend the object array class. Since all object arrays are represented by a single byte plus the component type information, there should be one byte to signify the array plus one byte for the primitive wrapper, which is two bytes (not too bad). Likewise, e.g. Byte[][].class would be three bytes, etc. If the length is still unacceptably long, a ClassTable can be used to reduce it down to a single byte.

"galder.zamarreno@jboss.com" wrote:
3.- Looks like collections like ArrayList, LinkedList, Hashmap, TreeMap...etc are treated as Serializable objects whereas JBoss Cache treats them differently. Instead, we look through the collection and see if it's an object we might want to marshall it in a diferent way to standard Serialization, i.e. a list of ReplicableCommand (ReplicableCommand does not implement Serializable). How would we deal with this?

That can depend. You can always use an Externalizer for those types to override how the serialization occurs. Another option is to use an ObjectTable, which lets you customize the serialization process to a slightly higher degree (at the cost that both the reader and the writer need to have matching ObjectTable specifications; Externalizers do not have this restriction as the writer's Externalizer is itself serialized).

"galder.zamarreno@jboss.com" wrote:
4.- Also, JBC treats SingletonList instances separately by just copying the single object within them. Any plans to add this to JBoss Marshalling as well?

This could also be easily handled with an Externalizer.

"galder.zamarreno@jboss.com" wrote:
Other than this, JBoss Marshalling seems to contain the rest of optimisation JBC did for known type arrays, repeated objects, null values...etc. So, in spite of agreeing with Jason's statement in https://jira.jboss.org/jira/browse/JBCACHE-1336 earlier today:

It might be more useful to define type marshallers outside of the type since most of our magic numbers apply to types not under our control (JDK types).

I don't think this might apply any longer if the above points can be resolved. IOW, all types that are not under our control would already be handled by JBoss Marshalling which means that we can concentrate on our types and we could then use @Marshallable annotations.

Yeah, I think that between ClassTable, ObjectTable and ClassExternalizerFactory you can implement all the customizations you need.

27. Re: Common marshalling infrastructure

galder.zamarreno Mar 25, 2009 4:51 PM (in response to dmlloyd)

"david.lloyd@jboss.com" wrote:
"galder.zamarreno@jboss.com" wrote:
2.- Looking at JBoss Marshalling, I saw that you're considering primitive arrays as known classes but primitive object arrays are not. Is this an oversight or on purpose? i.e.
map.put(Byte[].class, Protocol.ID_BYTE_ARRAY_CLASS);
map.put(Boolean[].class, Protocol.ID_BOOLEAN_ARRAY_CLASS);
map.put(Character[].class, Protocol.ID_CHAR_ARRAY_CLASS);
...
The reason for this is that primitive array classes do not extend the object array class. Since all object arrays are represented by a single byte plus the component type information, there should be one byte to signify the array plus one byte for the primitive wrapper, which is two bytes (not too bad). Likewise, e.g. Byte[][].class would be three bytes, etc. If the length is still unacceptably long, a ClassTable can be used to reduce it down to a single byte.

Yeah, but the problem here is that aftewards, you're treating Primitive Object arrays as serializable and you're not taken advantage of the same performance gains that you applied to primitive arrays.

IOW, in RiverMarshaller.doWriteObject, primitive arrays are treated specially and this is done based on class id coming BASIC_CLASSES. Now, if you don't give the same class IDs or paralell ones to primitive object arrays (Byte[], Integer[]...etc), you're missing out on a performance improvement gain.

Currently, looks to me primitive object arrays would be treated as Serializable.

28. Re: Common marshalling infrastructure

dmlloyd Mar 27, 2009 11:05 AM (in response to dmlloyd)

"galder.zamarreno@jboss.com" wrote:
Yeah, but the problem here is that aftewards, you're treating Primitive Object arrays as serializable and you're not taken advantage of the same performance gains that you applied to primitive arrays.

IOW, in RiverMarshaller.doWriteObject, primitive arrays are treated specially and this is done based on class id coming BASIC_CLASSES. Now, if you don't give the same class IDs or paralell ones to primitive object arrays (Byte[], Integer[]...etc), you're missing out on a performance improvement gain.

Currently, looks to me primitive object arrays would be treated as Serializable.

If you look in RiverMarshaller.doWriteObject() (line 382 in trunk), I check if the object is an array. If so, the descriptor is written, which consists of a one byte marker (ID_OBJECT_ARRAY_TYPE_CLASS) followed by the component type (which, in the case of the wrapper classes, is treated specially so that's one more byte).

After the descriptor, the array is written which is a sequence of the objects, which in the case of primitive wrapper objects, consists of one type byte (looked up in the type map) followed by the literal value of that wrapper object, or one byte for null.

This isn't quite as speed- and space-efficient as a pure primitive array but it should be pretty darn close. Also, it means that an Object[] which is filled with, say, Integer instances will take up no more space than an Integer[]. And one byte plus the int value is just about as dense as I can make this, since each value might be null as well.

I don't see how I can really improve on this, unless it's to do something funky like use a bit vector to mark nulls and then pass in the literal values. But I don't think that'd be worth the gain.