How To Select The Right Connectors

This article is applicable only for ModeShape 2.x, and does NOT apply to the new architecture in ModeShape 3.x.

 

ModeShape provides many different connectors that can be used to construct a JCR repository.  This offers great flexibility when configuring ModeShape, but can also cause some confusion and hesitation when selecting the connectors to use.  However, there are some heuristics that can be applied to select the most appropriate connectors for any situation. 

 

For the purposes of clarity, this document will use the term "repository" to refer to a JCR repository.  In ModeShape, repositories access and store content in "repository sources".  Repository sources are instances of "connectors".  Each connector is designed to provide a particular storage mechanism, but there can be multiple repository sources configured for the same connector in a single ModeShape instance.

 

It is important to remember that repository sources can be federated in ModeShape, so a combination of connectors can be used to create repository sources if no single connector satisfies all of the requirements.

 

The Two Key Questions for Selecting Connectors

There are two key questions that should be asked when selecting the connector or connectors for a repository:

 

  1. Do applications outside of ModeShape need to access this content?
  2. Does the content need to be stored persistently?

 

Interoperability - Do applications outside of ModeShape need to access this content?

 

Although ModeShape was designed to allow external applications to directly access content in native formats, not every connector supports this.  The file system connector, SVN connector, JDBC metadata connector, and JCR connectors all support direct external access to the underlying content.  The underlying formats for these connectors don't really overlap, so it's likely that your external applications will only be able to access one of these formats.  Plan on adding a repository source based on the appropriate connector to your ModeShape configuration.

 

Each of these connectors has some unique drawbacks though, so if you don't need to have external applications directly access your content, you should not use one of these connectors.  If your external applications access content relatively rarely and can do so by accessing files, you should carefully consider using the ModeShape WebDAV support to expose content from one of the other connectors as a WebDAV share.  This is a particularly useful trick when your repository is mostly storing files, but needs to decorate the files with custom metadata.

 

Persistence - Does the content need to be stored persistently across ModeShape restarts?


If you don't need to store content persistently, don't need to cluster, and your entire repository will fit in the JVM heap, always use the in-memory repository.  It provides the best performance, but loses all content when ModeShape restarts.  If you don't need to store content persistently but can't use the in-memory repository because you need to support clustering or your repository may grow too large between restarts, use the Infinispan connector without enabling persistence.  This allows you to spread the data across the cluster and can be configured such that no single node needs to host a copy of the entire repository.

 

If you do need to store content persistently, then you still have several choices to make.  With the assumption that one would pick the highest performing connectors that meets other needs, the connectors below are listed in decreasing order of performance.

 

Infinispan Connector

Don't use this connector if:

  • Your repository is so large that you can't afford to host it in memory even when its split across multiple machines.  For example, if your repository will contain 100GB of content and you don't have enough RAM across your cluster to run an Infinispan cluster with about 101GB of content, you should not use this connector.  Remember too that you'll generally want to keep some number of redundant copies of the data spread across the cluster, so in reality, 100GB of content will probably require more like 202GB or 303GB of RAM.

If you use this connector, remember to:

  • Configure Infinispan to persist the stored data to disk so that you can recover from a complete cluster failure without data loss.
  • Configure Infinispan to keep an appropriate number of extra copies of the data so that failure of a single node does not result in data loss
  • Structure your Infinispan topology appropriately so that failure of a single server does not wipe out multiple nodes and cause partial data loss

 

Disk Connector

Don't use this connector if:

  • You can't tolerate the possibility of data corruption in the event of a server crash.  The disk connector makes an effort to provide transactional integrity, but it is possible for only a part of a transaction to be committed if there is a server crash or ungraceful shutdown in the middle of updating data on disk.

If you use this connector, remember to:

  • Configure a node cache policy for the connector.  In our testing, this has had a dramatic impact on performance.
  • Use file-backed locking if you're using the connector in a cluster.  Using JVM locks won't provide isolation across different nodes in the cluster.
  • Set your large value threshold appropriately.  The default will work pretty well for most cases, but you should consider whether you're going to have many files just above the default size and tune the size up if need be.

 

JPA Connector

Don't use this connector if:

  • Your repository contains many large properties (e.g., files).  In general, relational databases aren't the best choice for storing large BLOBs.
  • Performance is more important to you than consistency in the event of a ModeShape crash.

If you use this connector, remember to:

  • Configure Hibernate caching as appropriate for your topology.  If you're in a cluster, use a Hibernate cache that supports clustering, like the Hibernate Infinispan cache
  • Use the DDL generation tool that ships with ModeShape to generate your DDL. The DDL is good enough to experiment, but if you're going to use it in production you probably want to have your DBA tune it for your particular DBMS (e.g., add tablespaces, file blocks, storage parameters, etc.) You should then run that DDL script on your database once, and promptly set the 'autoGenerateSchema' property on the JpaSource to "disable".   
  • Use the best database that you can. HSQLDB, for example, would not be the best choice to back a JPA repository source in a production ModeShape instance. Be sure to use the correct Hibernate dialect for the DBMS, though.

 

The JBoss Cache connector is included only for legacy environments, as JBoss is slowly phasing out JBoss Cache in favor of Infinispan.