14 Replies Latest reply on Apr 4, 2011 10:29 AM by moofish32

    Message and Context Serialization

    dward

      This discussion is related to: https://issues.jboss.org/browse/SWITCHYARD-9

       

      I have been charged with looking at ways to (de)serialize message, context and exchange instances.  Standard java serialization is slow and fat, but the biggest problem is it's inability to handle class file changes.  This started me down a path of looking at other options for SwitchYard.

       

      The first things that popped to my head were various XML encoding techniques (including java.beans.XMLEncoder/XMLDecoder).  But XML is fat and we want the thinnist and fastest mechanism we can find, so I started looking at other options...

       

      I found the following:

      http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

      http://code.google.com/p/memcached-session-manager/wiki/SerializationStrategyBenchmark

      http://code.google.com/p/kryo/wiki/BenchmarksAndComparisons

       

      As you can see, Kyro quickly became very attractive.  It is very fast and thin, and has minimal dependencies.  I was still initially leaning toward Google protobuf, but that requires config files (.proto files) to be in place, and you have to know which classes you are going to map for serialization ahead of time.  That doesn't work for us. We can do it dynamically at runtime with Kyro.  So, I think we should give Kyro a try.  We can protect ourselves (if we end up not being able to use Kyro) by wrapping it behind an interface.

       

      Note: I still need to figure out the licensing for Kyro, as online it says a form of BSD, yet in the source download it says it's free and reusable, but with a custom license.

       

      Thoughts?

        • 1. Message and Context Serialization
          dward

          Hmmm... Looks like I missed this updated graph:

          https://github.com/eishay/jvm-serializers/wiki/

           

          I'm now quickly checking out protostuff, which might iron out the wrinkles of protobuf.

          • 2. Message and Context Serialization
            dward

            Yeah, Kyro is still looking better. More dynamic.

             

            Aside, I'm proposing we have a org.switchyard.io package in both switchyard-api (for interfaces) and switchyard-runtime (for implementations).  Agreed?

            • 3. Message and Context Serialization
              kcbabo

              Are we still planning on using JavaBeans as the serialization contract with a pluggable implementation underneath?  I thought this was the direction we were headed based on input from you and Kevin at the F2F.  So the contract would be such that all Message payloads must be JavaBean-compatible.  If they are not, a user-supplied class is required to map to/from a JavaBeans representation.  IIRC, the format we discussed for serialization was JSON based on it's wide adoption and avoidance of relying on serialized Java classes which can blow up between incompatible versions.

              • 4. Message and Context Serialization
                dward

                We wouldn't need JavaBeans as the serialization contract since this can handle private fields.

                The direction we were headed toward at the F2F was with minimal information to chew on.

                JSON serializers are slower, and the representation bigger than this.

                The above would not rely on serialized Java classes. It has a compatibility mechanism.

                • 5. Message and Context Serialization
                  kcbabo

                  The advantage that I see with the JavaBeans approach is that there is an established standard behind it.  So if we say "all content on the message must be a JavaBean or be capable of being represented as a JavaBean", there is a wealth of information and a standard behind how that works.  I'm thinking about this more at the contract level and not at the serialization level.

                   

                  At the serialization level, I imagine that JSON would be slower and bigger, so I don't have an issue with using a binary representation assuming that it's independent of the class definition (specifically, a given version of a class).

                   

                  I'm guessing we are more or less on the same page here, but figured I would check.

                  • 6. Message and Context Serialization
                    dward

                    I get what you're saying. Contract requirement vs. serialization format. Ok.

                    • 7. Message and Context Serialization
                      tfennelly

                      I think one advantage of something like JSON for us is that it's "readable" (vs a binary rep).  If the receiver is at first not able to make sense of JSON, at least they can see what they need to do in order to make it consumable on its side of the exchange.  If it's a binary format... it's hosed because it can't get at the data easily.

                       

                      I'd go for a clear contract + readable representation as the default (e.g. Javabean + JSON).  If we make it pluggable, then people can sub in something else where performance is a big issue.

                      • 8. Message and Context Serialization
                        dward

                        Shouldn't be too hard to have a couple different implementations.  However, I would prefer the higher-performance one to be the default, and if they need to debug something, they can configure the slower, but readable, format.

                        • 9. Message and Context Serialization
                          tfennelly

                          Problem with that approach is... the user needs more knowledge to get out of trouble when they find themselves in trouble   What I mean is... they hit a problem... stress levels rise a bit... they can't see the data... stress rises another bit... then they figure out they need make low level to configurations to switchyard just to get to see the data... stress levels hit the roof   I know I'd start throwing things out the window at that point !!

                          • 10. Re: Message and Context Serialization
                            dward

                            I wouldn't expect that much stress, as long as there is adequate documentation.  I lean toward the better performing one so that we do well in 1) out-of-the-box performance comparisons, and 2) less configuration to be done to stand-up a production-ready system.

                            • 11. Message and Context Serialization
                              kcbabo

                              I'm good with the binary serialization as a default.  I agree with Tom's point on a human parseable serialization for debugging purposes, but I think that can be configured in when required.

                               

                              If you want to see the actual details of a message as it moves to and from on the bus (i.e. a trace/audit log), then you can configure your service and/or domain to enable an audit policy.  The policy details can include a specific instruction for the serialization format requested.  In test/development environments, we just need to make it simple to set that policy at a domain level.

                              • 12. Message and Context Serialization
                                moofish32

                                Keith,

                                 

                                Thanks for pointing me to this thread.

                                 

                                I won't be labor any points here, but I think GPB is actually the middle ground between Kyro and XML.

                                 

                                GPB is human readable (every message can be printed becuase they are traversable see TextFormat.java).

                                 

                                GPB is binary and uses varint encoding so it will greatly outperform XML

                                 

                                GPB is flexible - and you don't HAVE to write proto files, but it performs faster with them.  You'll have to follow their descriptor classes to see how to dynamically construct a message (FileDescriptorSet -> FileDescriptor -> FieldDescriptor [repeated]...).  In addition you can change a message structure dynamically or via the actual .proto and not break any legacy devices (Kyro may struggle with this). You might also want to look at extensions as well.  I could see a very good use defining a standard .proto and letting users configure on the fly to hot deploy services.

                                 

                                GPB issues - lack of a java proto compiler and proto parser make runtime compilation a little cludgey (but do-able).  Kenton (GPB lead) may be working to create one of these.  There may be a very large message performance issue - I can't remember if this was solved.

                                 

                                Overall if performance is your main criteria I can not argue with the Kryo choice.  With GPB wide range of language libraries, traversability and readability the only reason I will use XML again is because another application forces me -- which will continue to happen.

                                 

                                If you guys want to continue to discuss GPB in more detail I can, but I figure reading posts isn't your favorite part of the job.

                                 

                                I'd also be interested in understanding how this code fits into the core.  I am still learning about how ESB's and other SOA infra's are really built.

                                 

                                Cheers!

                                • 13. Message and Context Serialization
                                  kcbabo

                                  Hey Michael,

                                   

                                  Your feedback and insight is very useful, so please keep it coming!

                                   

                                  What I would really like to see is that we use our existing Transform support to provide serialization.  That way, there's a well-defined and pluggable mechanism available to try out different serialization implementations.  Ideally, we would have multiple options and users would be able to declare the serialization strategy as part of their runtime configuration.


                                  David is working on pulling together an initial design around the serialization piece, so that's the next bit to look out for.  From there, we can talk about how GPB might integrate with that.

                                   

                                  BTW, I like the idea of introducing a GPB transformer.  I added a JIRA for it.

                                  https://issues.jboss.org/browse/SWITCHYARD-190

                                   

                                  cheers,

                                  keith

                                  • 14. Message and Context Serialization
                                    moofish32

                                    Ok so I am not sure if this warrants a new thread or not, you guys can tell me that.  GPB is looking at a significant change/enhancement and the language reminds me of Smooks.  However, I have not had time to chase this trail.  Also the solution is not ready yet, so I don't know if that matters to this group.  Regardless here it the thread on their community forum.  I realize this isn't quite at the internal level you guys are currently focused and does lay one layer on top of that.

                                     

                                    http://groups.google.com/group/protobuf/browse_thread/thread/598efbb11aedfc62

                                     

                                    -Mike