Design of JPA-like API and New Fine Grained replication

NOTE: This page is out of date.  The fine grained replication/JPA-like API of Infinispan will be developed as a separate project, under the Hibernate umbrella, to make better use of existing codebase, technology and expertise there.  The project is known as Hibernate OGM (Object-Grid Mapping), and designs can be found on Hibernate OGM General Informations

Background

The purpose of this feature is two-fold.  The original intention was a mechanism to provide Infinispan with fine-grained replication features where complex objects stored in the cache would be able to replicate only deltas rather than having to serialize entire complex object graphs.  The approach chosen was to mimic the JPA style of mapping objects to an alternate format (in the case of Hibernate, for example, this would be objects to relational tables).  The interesting part here is not the mapping, but the detection of deltas and the ability to generate and merge diffs on a Java object.

Alternate approaches

Previously (in JBoss Cache's POJOCache variant), this was achieved using bytecode weaving and AOP making use of tools such as JBoss AOP and Javassist.  For Infinispan we have chosen to use the JPA route since the AOP route had proved problematic in JBoss Cache.  Specifically, we found AOP too intrusive especially when you did not have control over the classes being added to the cache, and this was a significant source of bugs and usability issues for end-users.

Beyond just fine-grained replication

The fact that we needed a JPA-like API to achieve fine-grained replication meant that we needed an alternate API to Infinispan.  And almost as a side effect, we ended up with an API that was so similar to JPA that we decided we'd may as well adopt JPA itself, as it would provide developers with an API they are familiar with and would help people migrate off traditional databases and onto data grids.

 

It must be made clear though that JPA will not and possibly cannot be supported in its entirity, and certain parts of JPA-QL, for example, will not be supported and will require some manual work if migrating an existing JPA-based application.

JIRA and release targets

This feature is tracked by ISPN-24 and is targeted for Infinispan 5.0.

 

Supporting JIRAs

This JPA API will rely on existing Infinispan features, including:

 

Details

API for Session (looks like JPA EntityManager)

- persist()

  - object added to session

  - calling attach for an object (or in same key) that already exists should fail

 

- find()

  - constructs object from cache

    - In case of Pojos that are not collections, a brand new copy of the object is put into memory. Arrays are treated this way as well.

    - Lazy contstructor for Pojos: i.e. If Person has an address and client wants Person, bring only the Person. This will only work if reference objects were treated as Proxies rather than copies (1st level objects are copies).

    - In case of collections, a new proxy is created that tracks operations on the collection and can apply the differences to the cache itself.

  - In the case of non collection pojos, find maps object to identity map (L1) so that changes to object can be diffed at commit time.

    - identity map links object references to primitives/uuids.

 

- remove()

  - removes an object from the cache if there're no more references to the object left.

  - if further references left, reduce reference count in object.

 

- commit()

  - if attached - it's new object so map it to cache structure

  - if found, compare with L1 to discover changes and update the cache itself

    - if between find and commit, object is changed and in paralell, someone else has changed the same object in the cache,  committing changes would override cache contents by default (same as JPA).

 

class information

pojo.Reference

- method: Object getValue();

 

pojo.PrimitiveReferece<T> extends pojo.Reference

- method: T getValue();

 

pojo.Uuid extends pojo.Reference

- method: AtomicMap getValue();

 

internal cache structure

- main cache

  - k[Object] -> v[pojo.Reference]

  - k[pojo.Uuid] -> v[AtomicMap] where AtomicMap contains field values (pojo.Reference (primitive or Uuid)), type class, and reference count.

 

potential list fine grained representation in cache

- k[pojo.Uuid] -> v[pojo.MasterBucketTable] where MasterBucketTable contains number of buckets occupied and Max per bucket length (i.e. 2)

- k[pojo.Uuid-{bucketId}] -> v[pojo.Bucket] where Bucket is a list with Uuids and occupied size (max is per bucket size).

- calculating the size is as simple as figuring out the number of buckets occupied, going to the last one and checking the occupied size in the last bucket.

- adding a new element is as simple as navigating to last bucket and adding there.

  - if no more space in bucket, create new bucket and update MasterBucketTable.