Neil tried to use ModeShape 3 (outside of AS7), and ran into problems with queries that would return no results after restart. The source of the problem was that so far ModeShape 3 stores the Lucene indexes in-memory by default, although this is very easily remedied in the configuration by simply providing a specific location. Neil asked if there should be some sort of warning, so I'd like to discuss the default behavior in this thread.
Our initial thinking (even with ModeShape 2) was that we know very little about where the application is running, so we should behave in a transient way. This has several issues: the data is lost if the engine is shutdown, and the memory footprint is pretty high (since we're storing nearly everything in memory only). Consequently, I no longer think this is the best approach, but I'd like to get your impressions of what the default behavior should be. I can think of two primary possibilities:
- Transient by default - This is basically the current behavior, where binaries are stored in a temporary directory, content is stored in-memory, and indexes are stored in-memory. This results in a fairly large memory footprint, which we could reduce by storing the indexes in a temporary directory as well, and even by default setting up an internal Infinispan cache with a file-system cache store that also stores in a temporary directory. Regardless, this behavior means that as soon as the engine is shutdown, the data is all lost.
- Persistent by default - This would change the behavior so that all data is stored locally on the file system within the directory where the application is being run. So for example, we could create a directory with the repository name, and in that directory create a "store" directory (where we'd persist the Infinispan cache), a "binaries" directory for binary storage, and an "indexes" directory for the Lucene indexes. The benefit is that all the repository data survives a restart with no extra configuration, and the memory footprint is smaller. It also would (hopefully) be clear to the user that if these directories were created and they don't like the location, they would attempt to change them.
What do you think? Which makes sense for the zero/minimal configuration case?
I vote for #2 persistent.
For the AS7 deployment kit, I would expect the data files to be stored in a "modeshape" directory in the JBoss server data directory, along with Hornetq etc. (I haven't yet tried Modeshape 3 so forgive me if that's already the case). Since that is the only way that I will be using Modeshape, I do not have a preference for the standalone or embedded deployments.
I think that more log messages stating where data is located located can avoid confusion.
On the topic of configuration and defaults, I'd like to suggest that ModeShape provides Infinisipan configuration examples to help the new user get started. I would guess that some ModeShape users like me have little or no Inifinispan experience. Perhaps a section in the Wiki documentation with some examples and quick starts for common configurations (both with AS7 and without, but I'm mostly interested with AS7).
For example, I would like to configure a distributed Infinispan cache over multiple servers, with a ModeShape app running in AS7, to create an in-memory content grid. How would that be accomplished? What would the Inifinispan configuration be? How do I start Infinispan on all of the nodes (other than the node running ModeShape and AS7).... This might be obvious to some, but to me, it is not obvious from the Infinispan docs....
I second the points of Jonathan. Frist ModeShape should by default not loose anything (its easier to loose things later than to request things that are already lost....) and I also would liek to see example configs that are for the simple use cases like:
- data is stored in infinispan but backed on disk in a folder
- data is stored in infinispan but backed by a jdbc accessed RDBMS
- data is stored in memory (no backing)
- data is stored in distributed infinispan where on each node a backup is put on disk in a folder
However, we need to make sure if we change the default configuration, that our tests still use the in-memory settings
That's an excellent point, Horia. I hadn't thought of that aspect, but I completely agree. Interestingly, our unit tests need to be set up in a special way anyway, because of the testing requirements for Infinispan (e.g., the cache container needs to be killed after each test to clean out any cached information and to prevent content created in one test from leaking into other tests). We already have an AbstractJcrRepositoryTest (with SingleUseAbstractTest and MultiUseAbstractTest subclasses) that we're hopefully using in as many places as we can. Perhaps we need to use them more consistently, or perhaps we need to update how we set up our repositories for unit testing. WDYT?