Read Past EOF errors with Infinispan - Hibernate Search
smnarayam Apr 3, 2012 7:15 PMWe're currently attempting to setup a 10 node Infinispan distributed cluster, used as a HibernateSearch backend. Our stack is Hibernate 3.6.4, HibernateSearch 3.4.1, Infinispan 4.2.1 Final, JGroups 2.11.1 Final and Lucene 3.1.0. Our setup is a JMS Master / Slave configuration. We are using JNDI to lookup the Infinispan CacheManager. Following the Hibernate Search recommendation, we have the Metadata & Lock Caches replicated, and we are distributing the Data cache with hash = 3.
Under low usage, it holds up fine. However, whenever we start to stress the system, we start seeing a lot of 'read past EOF' errors on the master during index writes.
We have tried various options, like switching cache store to Jdbm (based on the last comment in this post - https://forum.hibernate.org/viewtopic.php?p=2449501), async write-behind to the cache store, exclusive_index_use set to false, and with all 3 caches distributed. Switching the Metadata / Lock caches to replicated and using exclusive_index_use gave us positive improvements (got us over "java.io.IOException: No sub-file with id .fnm found (files: [.fdt, .fdx])" exceptions), but we continue to see these ''read past eof' errors.
We believe we must be misconfigured somewhere. Can anyone recommend a fix?
The errors we see are -
04-02-2012 20:45:07 ERROR Hibernate Search: Directory writer-1 impl.LogErrorHandler: Exception occurred java.io.IOException: read past EOF
java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:207)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:339)
at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71)
at org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:118)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:684)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:659)
at org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:283)
at org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:191)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3358)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3296)
at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3159)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3232)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3214)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3198)
at org.hibernate.search.backend.Workspace.commitIndexWriter(Workspace.java:220)
at org.hibernate.search.backend.impl.lucene.PerDPQueueProcessor.run(PerDPQueueProcessor.java:109)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Another variation we see is -
04-02-2012 14:37:43 ERROR Lucene Merge Thread #8 impl.LogErrorHandler: Exception occurred org.apache.lucene.index.MergePolicy$MergeException: java.io.IOExcept
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: read past EOF
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517)
at org.hibernate.search.backend.impl.lucene.overrides.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:49)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:138)
at org.apache.lucene.index.SegmentReader$Norm.bytes(SegmentReader.java:409)
at org.apache.lucene.index.SegmentReader$Norm.bytes(SegmentReader.java:404)
at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:1084)
at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:636)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:112)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3938)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3614)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
Our Infinispan configuration is -
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:4.2 http://www.infinispan.org/schemas/infinispan-config-4.2.xsd"
xmlns="urn:infinispan:config:4.2">
<!-- *************************** -->
<!-- System-wide global settings -->
<!-- *************************** -->
<global>
<!-- Duplicate domains are allowed so that multiple deployments with default configuration
of Hibernate Search applications work - if possible it would be better to use JNDI to share
the CacheManager across applications -->
<globalJmxStatistics
enabled="true"
cacheManagerName="HibernateSearch"
allowDuplicateDomains="true" />
<!-- If the transport is omitted, there is no way to create distributed or clustered
caches. There is no added cost to defining a transport but not creating a cache that uses one,
since the transport is created and initialized lazily. -->
<transport
clusterName="HibernateSearch-Infinispan-cluster-MT"
distributedSyncTimeout="90000">
<properties>
<property name="configurationFile" value="infinispan-tcp.cfg.xml" />
</properties>
<!-- Note that the JGroups transport uses sensible defaults if no configuration
property is defined. See the JGroupsTransport javadocs for more flags -->
</transport>
<!-- Used to register JVM shutdown hooks. hookBehavior: DEFAULT, REGISTER, DONT_REGISTER.
Hibernate Search takes care to stop the CacheManager so registering is not needed -->
<shutdown
hookBehavior="DONT_REGISTER" />
</global>
<!-- *************************** -->
<!-- Default "template" settings -->
<!-- *************************** -->
<default>
<locking
lockAcquisitionTimeout="20000"
writeSkewCheck="false"
concurrencyLevel="5000"
useLockStriping="false" />
<lazyDeserialization
enabled="false" />
<!-- Invocation batching is required for use with the Lucene Directory -->
<invocationBatching
enabled="true" />
<!-- This element specifies that the cache is clustered. modes supported: distribution
(d), replication (r) or invalidation (i). Don't use invalidation to store Lucene indexes (as
with Hibernate Search DirectoryProvider). Replication is recommended for best performance of
Lucene indexes, but make sure you have enough memory to store the index in your heap.
Also distribution scales much better than replication on high number of nodes in the cluster. -->
<clustering
mode="d">
<sync replTimeout="90000" />
<l1 enabled="false" />
</clustering>
<jmxStatistics
enabled="true" />
<eviction
maxEntries="-1"
strategy="NONE" />
<expiration
maxIdle="-1" />
</default>
<!-- ******************************************************************************* -->
<!-- Individually configured "named" caches. -->
<!-- -->
<!-- While default configuration happens to be fine with similar settings across the -->
<!-- three caches, they should generally be different in a production environment. -->
<!-- -->
<!-- Current settings could easily lead to OutOfMemory exception as a CacheStore -->
<!-- should be enabled, and maybe distribution is desired. -->
<!-- ******************************************************************************* -->
<!-- *************************************** -->
<!-- Cache to store Lucene's file metadata -->
<!-- *************************************** -->
<namedCache
name="LuceneIndexesMetadata">
<clustering
mode="replication">
<stateRetrieval fetchInMemoryState="true" logFlushTimeout="30000" />
<sync replTimeout="90000" />
<l1 enabled="false" />
</clustering>
<loaders shared="true" preload="true">
<loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false">
<properties>
<property name="location" value="/usr/local/tc/.index/metadata" />
</properties>
</loader>
</loaders>
</namedCache>
<!-- **************************** -->
<!-- Cache to store Lucene data -->
<!-- **************************** -->
<namedCache
name="LuceneIndexesData">
<clustering
mode="d">
<hash numOwners="3" />
<sync replTimeout="90000" />
<l1 enabled="false" />
</clustering>
<loaders shared="true" preload="true">
<loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false">
<properties>
<property name="location" value="/usr/local/tc/.index/data" />
</properties>
</loader>
</loaders>
</namedCache>
<!-- ***************************** -->
<!-- Cache to store Lucene locks -->
<!-- ***************************** -->
<namedCache
name="LuceneIndexesLocking">
<clustering
mode="replication">
<stateRetrieval fetchInMemoryState="true" logFlushTimeout="30000" />
<sync replTimeout="90000" />
<l1 enabled="false" />
</clustering>
</namedCache>
</infinispan>
Our hibernate configuration on the master -
<property name="hibernate.search.default.directory_provider">infinispan</property>
<property name="hibernate.search.infinispan.cachemanager_jndiname">InfinispanCacheManager</property>
<property name="hibernate.search.infinispan.chunk_size">40960</property>
<property name="hibernate.search.default.exclusive_index_use">true</property>
Hibernate configuration on the slaves -
<property name="hibernate.search.default.directory_provider">infinispan</property>
<property name="hibernate.search.infinispan.cachemanager_jndiname">InfinispanCacheManager</property>
<property name="hibernate.search.infinispan.chunk_size">40960</property>
<property name="hibernate.search.worker.backend">jms</property>
<property name="hibernate.search.worker.jndi.class">org.apache.naming.java.javaURLContextFactory</property>
<property name="hibernate.search.worker.jms.connection_factory">ConnectionFactory</property>
<property name="hibernate.search.worker.jms.queue">/queue/hibernateSearchQueue</property>