8 Replies Latest reply: Jun 13, 2012 10:54 AM by Horia Chiorean RSS

    Where should ModeShape 3 store indexes (and binaries) by default?

    Randall Hauch Master

      Neil tried to use ModeShape 3 (outside of AS7), and ran into problems with queries that would return no results after restart. The source of the problem was that so far ModeShape 3 stores the Lucene indexes in-memory by default, although this is very easily remedied in the configuration by simply providing a specific location. Neil asked if there should be some sort of warning, so I'd like to discuss the default behavior in this thread.


      Our initial thinking (even with ModeShape 2) was that we know very little about where the application is running, so we should behave in a transient way. This has several issues: the data is lost if the engine is shutdown, and the memory footprint is pretty high (since we're storing nearly everything in memory only). Consequently, I no longer think this is the best approach, but I'd like to get your impressions of what the default behavior should be. I can think of two primary possibilities:


      1. Transient by default - This is basically the current behavior, where binaries are stored in a temporary directory, content is stored in-memory, and indexes are stored in-memory. This results in a fairly large memory footprint, which we could reduce by storing the indexes in a temporary directory as well, and even by default setting up an internal Infinispan cache with a file-system cache store that also stores in a temporary directory. Regardless, this behavior means that as soon as the engine is shutdown, the data is all lost.
      2. Persistent by default - This would change the behavior so that all data is stored locally on the file system within the directory where the application is being run. So for example, we could create a directory with the repository name, and in that directory create a "store" directory (where we'd persist the Infinispan cache), a "binaries" directory for binary storage, and an "indexes" directory for the Lucene indexes. The benefit is that all the repository data survives a restart with no extra configuration, and the memory footprint is smaller. It also would (hopefully) be clear to the user that if these directories were created and they don't like the location, they would attempt to change them.


      What do you think? Which makes sense for the zero/minimal configuration case?