3 Replies Latest reply on Oct 10, 2011 6:24 AM by galder.zamarreno

    Cache entries and serialization

    darrellburgan

      Sorry if this is documented, but I couldn't find a clear answer in the docs.

       

      All entries in an Infinispan cache are serializable, this much is clear. My question is in regards to cache warming. Let's say that I have configured Infinispan to persist the cache in a local file or some other local data store on each node in the cluster. Let's say that I have set it up such that when a node comes online it is instantly warmed to have the local data store's cache entries in it (I am assuming that using the cache loader this is a possible thing to do, but even if it isn't please consider the rest of the question).

       

      What if the serialVersionUID of the entries that get warmed into the cache this way no longer match the serialVersionUID of the current class definition for those entries in the JVM? Does Infinispan start throwing exceptions if I try to get() such an entry, or does it assume this means the cache entry is no longer up to date and treats the get() as a cache miss?

       

      What I'm trying to decide is two things:

       

      1. Should I formally come up with a serialVersionUID scheme for all entries I intend to put into Infinispan caches?

       

      2. Is it possible through some sort of cache loader/persistence mechanism to have each node in an Infinispan cluster keep its cache both in memory and on disk, and be "instantly warmed" when a node starts up?

       

      Sorry if these are dumb questions, but I'm trying furiously to get my arms around all the capabilities of Infinispan and figure out the best way to leverage it.


      By the way, Infinispan is an extremely cool product. I am blown away by all the flexibility and power it has (almost too many options!).

       

      Thanks in advance,

       

      Darrell Burgan

        • 1. Re: Cache entries and serialization
          galder.zamarreno

          If you're relying on standard Java Serialization, reading a byte[] of a pojo with different serialVersionUID will blow. Not because of Infinispan but because of the Java serialization rules. You can easily test this, but google can give you some clues on what will happen already: http://www.mkyong.com/java-best-practices/understand-the-serialversionuid/

           

          Any errors are likely to happen when reading the persistence store, so either when preloading, or when calling get() and allowing the data to be retrieved from the persistence store.

           

          At the end of the day, serialVersionUID is there to detect changes in the class structure, and quite likely in the serialized version, so it's there to protect users against doing bad things.

           

          As an alternative to Java Serialization rules, there's Infinispan Externalizers (https://docs.jboss.org/author/display/ISPN/Plugging+Infinispan+With+User+Defined+Externalizers) which do not require serialVersionUID, but you're gonna have the same problem if the structure of the pojo changes over time. The difference here is that you're gonna have to code around it.

           

          To get an idea of what it takes to handle multiple versions of a class, you can see the unit test I developed here:

          https://github.com/infinispan/infinispan/blob/master/core/src/test/java/org/infinispan/marshall/multiversion/MultiPojoVersionMarshallTest.java

           

          It uses javassist to generate new versions of a class where I either add or remove attributes. The code, if you can read it properly, shows you the type of tricks you need to use to cope with different versions of a class.

           

          With regards to your 2nd question, that's what state transfer is about, getting data from the rest of nodes in the cluster. If you're using distribution, the equivalent is called rehashing. If you're using synchronous replication, state transfer is enabled by default.

          • 2. Re: Cache entries and serialization
            darrellburgan

            Thanks this helps clarify it a lot.

             

            I was aware of the state transfer capabilities of the product, but one concern I had is about how that performs with really large caches. We will eventually have several hundred caches, any one of which might have several thousand entries in them. It seems like bringing a new node into the cluster with state transfer enabled might have a performance bottleneck in trying to download all that state, even if we're using distribution with only two owners. But I have never tried it, so I may be making a faulty assumption there.


            Anyway I will look at your unit test and play around with it and see what I can come up with. Thanks again.

            • 3. Re: Cache entries and serialization
              galder.zamarreno

              We're working on several fronts to improve state transfer. Starting with Infinispan 5.1, state does not come from a sole member in the cluster, but using consistent hash algorithm techniques, several members in the cluster can send state in paralell to the new joining node. Non-blocking state transfer, where the state provider continues working despite sending state has been disabled temporarily due to the changes I've just mention, but we're working on a new version of non-blocking state transfer.

               

              On top of that, for situations where a node might have a local persistent store, we're gonna implementing digesting to be able to figure out what might have changed since the node went down with regards to the contents of its local store and the in-memory data in other nodes.

               

              Btw, if state transfer is problematic for your use case, you can always disable it and configure a cluster cache loader instead. If data is not present in a node, this cache loader will query the cluster for the data. This works for replication caches. Distribution already does this by default cos not all nodes in the cluster have the data.