-
1. Re: Cache entries and serialization
galder.zamarreno Oct 4, 2011 10:50 AM (in response to darrellburgan)If you're relying on standard Java Serialization, reading a byte[] of a pojo with different serialVersionUID will blow. Not because of Infinispan but because of the Java serialization rules. You can easily test this, but google can give you some clues on what will happen already: http://www.mkyong.com/java-best-practices/understand-the-serialversionuid/
Any errors are likely to happen when reading the persistence store, so either when preloading, or when calling get() and allowing the data to be retrieved from the persistence store.
At the end of the day, serialVersionUID is there to detect changes in the class structure, and quite likely in the serialized version, so it's there to protect users against doing bad things.
As an alternative to Java Serialization rules, there's Infinispan Externalizers (https://docs.jboss.org/author/display/ISPN/Plugging+Infinispan+With+User+Defined+Externalizers) which do not require serialVersionUID, but you're gonna have the same problem if the structure of the pojo changes over time. The difference here is that you're gonna have to code around it.
To get an idea of what it takes to handle multiple versions of a class, you can see the unit test I developed here:
It uses javassist to generate new versions of a class where I either add or remove attributes. The code, if you can read it properly, shows you the type of tricks you need to use to cope with different versions of a class.
With regards to your 2nd question, that's what state transfer is about, getting data from the rest of nodes in the cluster. If you're using distribution, the equivalent is called rehashing. If you're using synchronous replication, state transfer is enabled by default.
-
2. Re: Cache entries and serialization
darrellburgan Oct 4, 2011 2:27 PM (in response to galder.zamarreno)Thanks this helps clarify it a lot.
I was aware of the state transfer capabilities of the product, but one concern I had is about how that performs with really large caches. We will eventually have several hundred caches, any one of which might have several thousand entries in them. It seems like bringing a new node into the cluster with state transfer enabled might have a performance bottleneck in trying to download all that state, even if we're using distribution with only two owners. But I have never tried it, so I may be making a faulty assumption there.
Anyway I will look at your unit test and play around with it and see what I can come up with. Thanks again. -
3. Re: Cache entries and serialization
galder.zamarreno Oct 10, 2011 6:24 AM (in response to darrellburgan)We're working on several fronts to improve state transfer. Starting with Infinispan 5.1, state does not come from a sole member in the cluster, but using consistent hash algorithm techniques, several members in the cluster can send state in paralell to the new joining node. Non-blocking state transfer, where the state provider continues working despite sending state has been disabled temporarily due to the changes I've just mention, but we're working on a new version of non-blocking state transfer.
On top of that, for situations where a node might have a local persistent store, we're gonna implementing digesting to be able to figure out what might have changed since the node went down with regards to the contents of its local store and the in-memory data in other nodes.
Btw, if state transfer is problematic for your use case, you can always disable it and configure a cluster cache loader instead. If data is not present in a node, this cache loader will query the cluster for the data. This works for replication caches. Distribution already does this by default cos not all nodes in the cluster have the data.