is it possible to do a "shared nothing" clustering with modeshape, ie. clustering where each node of the cluster has a completly independent physical copy of the data?
To me one of the big drawbacks when using Jackrabbit is it's clustering implementation, at least out of the box. The datastore for blobs (files) needs to be on a shared file system. CRX for instance is also using Jackrabbit as it's JCR implementation but with a persistence manager of their own, TarPM which allows them to do "Shared nothing" clustering.
Does ModeShape come with "Shared Nothing Clustering" ability out of the box?
No, we don't have something like that out of the box. When configuring a 2.x cluster, the connectors in a configuration are expected to be the same. And whatever the connector points to is expected to have the fault-tolerance needed by your application.
For example, if a connector in one configuration points to a particular database, then the same connector in all the other configurations should also point to the same database. That way, the same database instance is directly used by all the processes in the cluster. And that database may be a master-slave configuration for fault-tolerance purposes. (Of course, each configuration in the cluster can define multiple connectors.)
Even with ModeShape 3, the data-grid storage is expected to be shared across the cluster, and relies upon the transactional and highly-available features of the data grid.
What's the advantage of share-nothing clustering? If the storage system is fault-tolerant, then don't you want all the processes in the cluster to share the same storage system? If each process has its own storage system, then the changes to each process' storage need to be propaged across the entire cluster (which for large binary values can be extremely expensive). And if the communication between any parts of the cluster fail, then the data on each process will start to diverge. Essentially, this is akin to eventual consistency without the ability to recover from inconsistencies!
ModeShape's clustering does not require a cluster-wide write lock to update information. Instead, each type of connector delegates any concurrency constraints to the underlying storage. With database persistence, the database transactions are used (and if configured correctly would correspond to row-level locking rather than table-level locking). Infinispan uses transactions and entry-level locks. Since file systems don't offer transactions, the file system connector and disk storage connectors use locks at varying scopes.
thanks for answering.
Well one advantage of shared nothing clusters is that your office in San Francisco could access their local node and the office in Melbourne theirs. And the data would still be in sync. Is there a way to configure modeshape this way? Even when it's not possible out of the box? I must say this feature is one of the most apealing ones in Adobes CRX. I had kind of assumed that ModeShape could do this when I read that it's using JGroups for clustering. None of the Opensource Solutions that employ a JCR backend (Alfresco, Nuxeo, Magnolia .....) have this.
In 2.x, we rely upon the data source to be replicated. So you might be able to do this depending upon your DBMS. Or, if you're using Infinispan it might be possible to leverage the multi-site functionality in Infinispan/JGroups.
In 3.x we're using Infinispan, and plan to use the multi-site capabilities of Infinispan/JGroups to do exactly this. (Essentially, you'll configure the data grid to distribute multiple copies of every node such that each site has at least one full copy.)