I have created a content repository for images which uses the JPA connector with PostreSQL as its backend. The images are uploaded through the Modeshape WebDAV interface running in JBoss.
This works great till I have to restart JBoss. With a mode:autoGenerateSchema="validate" configuration for the connector, JBoss starts as usual but it takes forever for the validation to complete. With a repository containing about 1000 JPEG files it takes more than 30 minutes before I can login to the repository again through the WebDAV interface. I would expect the validation to be limited to the schema which should not take so long.
I traced the hits to the database in Wireshark and could see many queries being executed all the while I was waiting for the repo to be available.
Following is the configuration for my repos:
Define the sources used by the repository (or repositories) to store
and access the content
mode:defaultWorkspaceName="photos" mode:autoGenerateSchema="validate" />
<!-- One source for the "/jcr:system" content ... -->
mode:defaultWorkspaceName="system" mode:autoGenerateSchema="validate" />
If I use mode:autoGenerateSchema="create" in the configuration instead, I am able to access the repo using WebDAV almost immediately after JBoss start, but then I will have to upload all the content all over again which again takes a long time.
I am using JBoss 5.1.0 GA with Modeshape 2.4 and PostgreSQL 8.3.
I also looked at the JBoss process in VisualVM and could see frequent GC happening with a max heap size of 512 MB while the validtion was happening. Increasing the max heap size to 1024 MB and setting the GC scheme to CMS the GC thrashing reduced, but still the heap fills up slowly and takes almost the same amount of time for validation to complete.
Any pointers on how to improve performance is highly appreciated.
I suspect that the problem is the index scanning that happens upon restart. You might try changing the 'indexReadDepth' JCR repository option. This option is '4' by default (which in your case might end up trying to read many/most/all of your image files in one fell swoop, but you might try something smaller -- maybe even '1' -- and let us know if that helps.
I also think that we shouldn't be reindexing the content upon startup (but maybe should in a clustered situation). That logic needs to be improved. Can you please log a defect?
Thanks Randall for your inputs. I changed the 'indexReadDepth' configuration option to '1' as suggested but it did not help. I forgot to mention that JBoss shows an exception after a long wait:
00:06:46,062 INFO [PluginContainerResourceManager] Discovering Resources...
00:11:51,914 WARN [DiscoveryComponentProxyFactory] The discovery component for resource type [ResourceType[id=0, category=Service, name=Repositories, plugin=ModeShapePlugin]] has been blacklisted
00:11:51,915 WARN [InventoryManager] Failure during discovery for [Repositories] Resources - failed after 300002 ms.
org.rhq.core.pc.inventory.TimeoutException: Call to [org.modeshape.rhq.plugin.RepositoryDiscoveryComponent.discoverResources()] with args [[org.rhq.core.pluginapi.inventory.ResourceDiscoveryContext@40442f2a]] timed out. Invocation thread will be interrupted
at $Proxy399.discoverResources(Unknown Source)
I have filed a bug (MODE-1097) on this.