1 2 Previous Next 25 Replies Latest reply: Jul 1, 2010 5:44 AM by Manik Surtani RSS

CacheLoader w/ JPA or Hibernate backend

Rafael Ribeiro Novice

Hi all,

 

Are there any plans to support something similar to http://publib.boulder.ibm.com/infocenter/wxsinfo/v7r0/index.jsp?topic=/com.ibm.websphere.extremescale.over.doc/cxsljpaload.html?

I´ve given a look at the code for Jdbc cache loader to check if it was possible (and also relatively easy) to implement this but I am not sure since Infinispan stores some extra metadata apart from cache data on store.

Am I right?

I was checking this possibility since it presents some interesting usage scenarios...

 

 

regards,

Rafael Ribeiro

  • 1. Re: CacheLoader w/ JPA or Hibernate backend
    Manik Surtani Master
    It's on the roadmap (ISPN-31) but the contributor who was working on it has gone quiet.  If you are interested in developing this, I'd be happy to guide you, etc where needed.
  • 2. Re: CacheLoader w/ JPA or Hibernate backend
    Rafael Ribeiro Novice

    I can give a try. What you suggest reading beforehand?

    Any particular class, document?

  • 3. Re: CacheLoader w/ JPA or Hibernate backend
    Manik Surtani Master

    The CacheStore interface + Javadocs, maybe have a look at some other simple cache store impls (JDBC, File, even the DummyInMemoryCacheStore).  The JIRA also has a reference to a mailing list thread which you may want to brush up on where the topic was discussed.

     

    And feel free to ask questions on the infinispan-dev mail list.

  • 4. Re: CacheLoader w/ JPA or Hibernate backend
    Rafael Ribeiro Novice

    Hi Manik,

     

    I've already a draft class that works... it is already able to prefetch but I have some design points I'd like to discuss.

    First of all... My first thought was to use @Id field as key... I've implemented but then I thought that this way I can only search a "cache region" (sorry but I don't remember how it is actually called in Infinispan) knowing the actual id value. This has a huge limitation since if I wanted to fetch an item from the cache using something different like a surrogate key (or NaturalKey using Hibernate naming) it won't be possible.

    Then I thought about having a parameter that would be responsible for specifying a Key class... this way users would be able to specify a class that implemented some arbitrary interface with a single method public Object generateKey(Object entity)... this method would be called and then we could have arbitrary keys generated when data was prefetched from persistent storage. This could also be combined with the first idea so if this factory was unspecified we could have a default one that would lookup object Id field and use it as key.

    Then I had a look at the roadmap and found that the Query API is already available as a preview... this made me wonder if it is already (or will) be possible to search using a query api over all items on a cache region... this has some challenges since (ideally) items need to be indexed in order to avoid something like a "table scan" but in memory...

    In summary... do you think I can keep coding using any of the ideas above? Is there anyway I can submit you my work in progress ?

     

    regards,

    Rafael Ribeiro

  • 5. Re: CacheLoader w/ JPA or Hibernate backend
    Manik Surtani Master

    Hi Rafael

     

    I'm not sure why you need a key generator.  I was thinking something along the lines of wrapping a cache entry into a persistable object.  E.g., CacheStore.store() is called with an InternalCacheEntry.  The impl could create something like the foll

     

    @Entity
    class PersistableObject {
      @Id @GeneratedValue long id;
       Object key; // this could be a byte[] if the key is not an @Entity?
       Object value; // this could be a byte[] if the value is not an @Entity?
       int keyHashcode;
       long lifespan; // and all the other metadata in an InternalCacheEntry
    }
    

     

    Searching for objects would be simple.  When load(Object key) is called, run a JPA query with the key's hashcode and then test the results' keys to see which matches the key passed in.

     

    Re: querying with the query module, the query module maintains a separate set of full-text indexes in Lucene.  So the query module is able to determine keys based on a full-text query on its own, with no extra help from the cache loader.  The cache loader would just see certain calls to load(key). 

     

    Re: code, I think you can attach files to this thread.

  • 6. Re: CacheLoader w/ JPA or Hibernate backend
    Rafael Ribeiro Novice

    Hi Manik!

     

    I'll try to post the code by the weekend but first we need to align what we are thinking cause I guess we are imagining two completely different solutions.

    I was searching for a solution to turn upside down Hibernate and it's second level cache (or any JPA impl). I know that it comes at certain risks and limitations since Hibernate imposes that no object is shared for the reason of avoiding concurrent modifications to an entity. So what I tried to achieve was an automagic way of storing entities into a faster medium and retrieving them from there. So, instead of wrapping my cached data into a JPA entity my data would be the JPA data itself. I guess it'll become clearer when I post the code and some demo. That's the reason I thought about some way of having a particular key (instead of the value of the regulard @Id field) so my application code could hand some specially crafted key and have a direct lookup in the cache (it could also trigger a load in case it is not immediate available in memory). Eg.:

    my application had an entity called Customer as follows:

     

    @Entity
    public class Customer {
    @Id
    private long id;
    @Basic
    private String ssn;
    @Basic
    private String name;
    }
     

     

    and I wanted to be able to do a catch.get(new CustomerSSNKey("someSSN"));

    CustomerSSNKey would be somehow like this:

     
    public class CustomerSSNKey implements Serializable {
    private String ssn;
    [constructor, equals and hashcode that takes ssn into account]
    }
    

     

    this way I'd trigger a direct lookup of the cache with no index involved neither any iteration over a certain range of ids. The factory I mentioned would be responsible for getting the desired fields on the entity and creating this special key during cache store preload and other activities.

     

    Anyways even though I wanted to be able to do this the Query API presents some wider possibilities (at the probable cost of an slight performance impact) so I'll span a new thread cause I could only find the API jar on samples folder and no example code at all could you please point me any docs? I am really willing to do a proof of concept on it.

     

    Did I make it any clearer? Do you think this makes sense?

    We have a scenario where this solution seems to make perfect sense and I am trying to exercise if we could move on in this direction with ISPN.

     

    best regards,

  • 7. Re: CacheLoader w/ JPA or Hibernate backend
    Rafael Ribeiro Novice

    Hi Manik,

     

    As promised the "not so draft" implementation. Still missing the purgeInternal that I was lazy now to try to figure out what it was supposed to do (I guess a clear but I'll check before implementing) and a method of looking up a PersistenceContext through JNDI.

    I also gave up on this weird idea of custom keys since it'd be almost impossible to have this thing supported on loadLockSafe.

    I am attaching some samples as well.

     

    regards,

    Rafael Ribeiro

  • 8. Re: CacheLoader w/ JPA or Hibernate backend
    Rafael Ribeiro Novice

    Hi Manik!

     

    There is one thing that I noticed while I was coding this loader: loadAllLockSafe is too prone to OutOfMemoryErrors since it tries to load everything in a single shot. Won't it be better if it had some kind of callback that would be responsible for offloading the preloading node in order to avoid OutOfMemoryErrors?

     

    regards,

    Rafael Ribeiro

  • 9. Re: CacheLoader w/ JPA or Hibernate backend
    Manik Surtani Master

    Hi Rafael.

     

    Thanks for the prototype.  A few points:

     

    • storeLockSafe() needs to deal with storing of new entries as well as updating existing ones.  For this, entityManager.merge() probably won't work, you may need to test if the entity exists first.
    • The cache entry key may not be the ID of the entity.  So the 'key' you see in storeLockSafe(key, entry), for example, may not be the entry's ID at all.  (Perhaps you could mandate this if the JPACacheStore is to be used)
    • purgeInternal() could be implemented easily if you create an additional, internal entity for metadata, and here you store the key and expiry timestamp of each entry added.  Then you can write an easy JPA-QL query to remove entities where keys have expired according to the metadata. InternalCacheEntry stores the expiry time.  Saves you a whole lot of deserializing. 
    • You load up entity types in your init() method.  Is it possible to assume that any value passed in will be an entity, and get this on the fly?  This way users wouldn't have to define their entity types in their config?  Or is this too much of a performance overhead?
  • 10. Re: CacheLoader w/ JPA or Hibernate backend
    Manik Surtani Master

    There is one thing that I noticed while I was coding this loader: loadAllLockSafe is too prone to OutOfMemoryErrors since it tries to load everything in a single shot. Won't it be better if it had some kind of callback that would be responsible for offloading the preloading node in order to avoid OutOfMemoryErrors?

     

    If you look at the interfaces in trunk, I have introduced a couple of new methods on CacheLoader:

     

    /**
     * Loads up to a specific number of entries.  There is no guarantee as to order of entries loaded.  The set returned
     * would contain up to a maximum of <tt>numEntries</tt> entries, and no more.
     * @param numEntries maximum number of entries to load
     * @return a set of entries, which would contain between 0 and numEntries entries.
     * @throws CacheLoaderException
     */
    Set<InternalCacheEntry> load(int numEntries) throws CacheLoaderException;
    /**
     * Loads a set of all keys, excluding a filter set.
     *
     * @param keysToExclude a set of keys to exclude.  An empty set or null will indicate that all keys should be returned.
     * @return A set containing keys of entries stored.  An empty set is returned if the loader is empty.   
     * @throws CacheLoaderException
     */
    Set<Object> loadAllKeys(Set<Object> keysToExclude) throws CacheLoaderException;
    
    

     

    These are now called where loadAll() used to be called.  This helps reduce the risk of OOMs.  The former is used when preloading a cache at startup time (why load more than maxEntries entries, when they would just be evicted anyway?) and the latter during a rehashing when all you really want to load are the keys, to test which entries should be rehashed to a different node.

  • 11. Re: CacheLoader w/ JPA or Hibernate backend
    Rafael Ribeiro Novice

    Hi Manik,

     

    I'll review what you said about the JPACacheStore later on at home. But still on preload subject:

    These are now called where loadAll() used to be called.  This helps reduce the risk of OOMs.  The former is used when preloading a cache at startup time (why load more than maxEntries entries, when they would just be evicted anyway?) and the latter during a rehashing when all you really want to load are the keys, to test which entries should be rehashed to a different node.

    Correct me if I'm wrong but with this every node needs to call execute preload logic ? Otherwise we would end up with a single node preloaded and then I see another point: what should the node do if it loads an entry that does not belong to it? That's why I thought we should have a callback and a single node (we would have to have a kind of lock during grid startup to prevent multiple nodes preloading) would preload all data and this callback would be responsible for either storing locally or sending the data to proper node.

     

    regards,

    Rafael Ribeiro

  • 12. Re: CacheLoader w/ JPA or Hibernate backend
    Manik Surtani Master

    Well that would depend on the type of cache store in use.  Remember that whatever solution we put in place would need to generic enough to work well with different cache store impls.  For example, it often is more efficient to read from a cache store than it is to push state across a network - if a cache store is based on a local file system, for example. 

     

    Regarding load(int numEntries) and preload, this assumes that cache stores are not shared.  I.e., cache stores are local to each node, sich as a FileCacheStore or a BdbjeCacheStore.  So that what you find in a cache store is pretty much always mapped to your node (discounting what rehashing may have occured while a node was offline).

     

    In the case of a shared cache store such as JDBC or JPA, one would assume that preload is not used in this case - either that, or the cost of loading entries which are then discarded is taken on the chin.  Perhaps a further improvement to this API could be load(int numEntries, Filter f) where Filter could be an interface:

     

    interface Filter {
      boolean allowKey(Object key);
    }
    
    
  • 13. Re: CacheLoader w/ JPA or Hibernate backend
    Manik Surtani Master

    Hi Rafael.  Have you got any updates wrt this feature?

  • 14. Re: CacheLoader w/ JPA or Hibernate backend
    Rafael Ribeiro Novice

    Hi Manik!

     

    Sorry for not giving any status for that long time but I had to take care of other tasks and had to leave this on hold for a momment. I have an almost working copy at home but I cant send right now since I am attending IBM Impact and I am a little far away from home and therefore my notebook where the code is...

     

    But I think I still remember a few things I am still figuring out how to address...

    Manik Surtani wrote:

     

    A few points:

     

    • storeLockSafe()  needs to deal with storing of new entries as well as updating existing  ones.  For this, entityManager.merge()  probably won't work, you may need to test if the entity exists first.
    • The  cache entry key may not be the ID of the entity.  So the 'key' you see  in storeLockSafe(key,  entry), for example, may not be the entry's ID at all.  (Perhaps  you could mandate this if the JPACacheStore is to be used)
    • purgeInternal()  could be implemented easily if you create an additional, internal entity  for metadata, and here you store the key and expiry timestamp of each  entry added.  Then you can write an easy JPA-QL query to remove entities  where keys have expired according to the metadata. InternalCacheEntry  stores the expiry time.  Saves you a whole lot of deserializing. 
    • You load up entity types in your init() method.  Is  it possible to assume that any value passed in will be an entity, and  get this on the fly?  This way users wouldn't have to define their  entity types in their config?  Or is this too much of a performance  overhead?

    Regarding the first one I'll need to check how to solve this since JPA 1.0 does not provide any saveOrUpdate method as Hibernate does (as far as  I remember) and unfortunately I cant assume it is using any particular provider and use any hack to check entity state.

    Second one - this I sincerely did not get what you meant... while designing I saw that we had two paths to take:

      1st: assume the ID for the cache is the same as for the entity and give users means for looking up the cache by the id.

      2nd: let the user specify either the regular key or a natural immutable key (gives greater flexibility but increases store complexity a lot)

    Third one - I was really uncomfortable with this expiry time in my store impl since at first I was thinking of providing means for the user to have a fast access layer to persistent entities through infinispan... something like seeing Hibernate the other way around. This way, we cant assume the user will have a timestamp field in its entity that we can use as a expiry time since we are talking about entities... do you think this violates way too much what loaders are for? Sincerely... I see a huge room for applications on top of this idea (there are a miriad of huge volume transactional applications today that needs a super fast storage area for looking up additional processing data).

    Fourth and last one: I thought about having each loader serving a single entity (or an entity hierarchy).

     

    regards,

    Rafael Ribeiro

1 2 Previous Next