copying Lucene FSDirectory to InfinispanDirectory issues
garcimouche Jul 7, 2010 11:53 AMI'm in the process of establishing a proof of concept that aims to replace my current Lucene FSDirectory by the InfinispanDirectory implementation.
My current index holds around 7 millions of documents and has a size of 2 GB on the file system.
I wanted first to have a try with an index snapshot of around 850000 documents for a size of 200 MB on disk.
In order to accomplish that my first step would be to dump my current Lucene index into Infinispan where the data in the grid would be backed up by
a jdbc store (xml config at the end of this post).
I configure infinispan for data distribution across the grid but for this stage I intend to use a single node.
So I created a simple Java Standalone app where I use the Lucene Directory.copy API method using the FSDirectory as the source and InfinispanDirectory
as the target. I set the heap size to 2GB (-Xmx2048m).
1) Right after the copy I stop my cache using the cache.stop() method (Is it the right way to shutdown a grid?)
and I expect the remaining in memory data to be dumped to my db store.
The connection pool C3P0 does not seem happy about this and issue a WARN message.
2010-07-07 10:29:58,364 WARN (com.mchange.v2.resourcepool.BasicResourcePool)[CoalescedAsyncStore-2:] com.mchange.v2.resourcepool.BasicResourcePool@150f0a7 -- an attempt to checkout a resource was interrupted, and the pool is still live: some other thread must have either interrupted the Thread attempting checkout! java.lang.InterruptedException at java.lang.Object.wait(Native Method) at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1315) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:557) at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:477) at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:525) at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:128) at org.infinispan.loaders.jdbc.connectionfactory.PooledConnectionFactory.getConnection(PooledConnectionFactory.java:102) at org.infinispan.loaders.jdbc.binary.JdbcBinaryCacheStore.loadBucket(JdbcBinaryCacheStore.java:213) at org.infinispan.loaders.bucket.BucketBasedCacheStore.storeLockSafe(BucketBasedCacheStore.java:59) at org.infinispan.loaders.LockSupportCacheStore.store(LockSupportCacheStore.java:147) at org.infinispan.loaders.decorators.AbstractDelegatingStore.store(AbstractDelegatingStore.java:46) at org.infinispan.loaders.decorators.AsyncStore.applyModificationsSync(AsyncStore.java:180) at org.infinispan.loaders.decorators.AsyncStore$AsyncProcessor.put(AsyncStore.java:386) at org.infinispan.loaders.decorators.AsyncStore$AsyncProcessor.run0(AsyncStore.java:370) at org.infinispan.loaders.decorators.AsyncStore$AsyncProcessor.run(AsyncStore.java:312) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)
The main thread is waiting eternally (JGroups communication?). After a while I have to kill the process to stop the JVM.
Nevertheless I did the reverse operation (copy the just created InfinispanDirectory backed by mySQL db into another brand new FSDirectory)
to verify that no documents were missing (with Luke).
Everything is just there (this time the cache.stop does not generate any errors and the JVM ends properly on my main thread with System.exit,
the JGoups Transport also logs a clean disconnection).
2) I then run the exact same process with the entire Lucene file (7 millions doc.) and get an OutOfMemory exception:
Exception in thread "luceneIndex-JdbcBinaryCacheStore-0" java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.getBytes(Buffer.java:124) at com.mysql.jdbc.Buffer.readLenByteArray(Buffer.java:282) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:947) at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:293) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1239) at com.mysql.jdbc.Connection.execSQL(Connection.java:2051) at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1496) at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:76) at org.infinispan.loaders.jdbc.binary.JdbcBinaryCacheStore.purgeInternal(JdbcBinaryCacheStore.java:280) at org.infinispan.loaders.AbstractCacheStore$2.run(AbstractCacheStore.java:84) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2010-07-07 10:56:41,807 ERROR (com.fedextnc.hscode.index.Main)[main:] Error caught in main method java.lang.OutOfMemoryError: Java heap space at org.infinispan.lucene.InfinispanIndexIO$InfinispanIndexOutput.newChunk(InfinispanIndexIO.java:217) at org.infinispan.lucene.InfinispanIndexIO$InfinispanIndexOutput.writeBytes(InfinispanIndexIO.java:240) at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:43) at org.apache.lucene.store.Directory.copy(Directory.java:197) at com.fedextnc.hscode.index.tools.LuceneDirectoryCopy.copy(LuceneDirectoryCopy.java:34) at com.fedextnc.hscode.index.Main.main(Main.java:100)
I understood (maybe wrongly) that Infinispan manages to avoid getting out of memory and rely for that on the eviction strategy.
I use -1 as the maximumEntry of entries in memory for maximum performance, I also tried to specify a maximumEntry limit without success.
Can someone tell me what is wrong in my approach?
Env:
Linux Ubuntu 32 bits.
Java 1.6_018 Hotspot
Infinispan 4.1.0.BETA2
here after my xml configuration:
<?xml version="1.0" encoding="UTF-8"?> <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:4.0 http://www.infinispan.org/schemas/infinispan-config-4.1.xsd" xmlns="urn:infinispan:config:4.0"> <global> <transport clusterName="lucene-cluster" transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport" /> </global> <namedCache name="luceneIndex"> <loaders passivation="false" shared="true" preload="true"> <loader fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false"> <properties> <property name="bucketTableNamePrefix" value="xx_finder" /> <property name="idColumnName" value="ID_COLUMN" /> <property name="dataColumnName" value="DATA_COLUMN" /> <property name="timestampColumnName" value="TIMESTAMP_COLUMN" /> <property name="timestampColumnType" value="BIGINT" /> <property name="connectionFactoryClass" value="org.infinispan.loaders.jdbc.connectionfactory.PooledConnectionFactory" /> <property name="connectionUrl" value="jdbc:mysql:///infinispan" /> <property name="userName" value="infinispan" /> <property name="driverClass" value="com.mysql.jdbc.Driver" /> <property name="idColumnType" value="VARCHAR(256)" /> <property name="dataColumnType" value="BLOB" /> <property name="dropTableOnExit" value="false" /> <property name="createTableOnStart" value="true" /> </properties> <async enabled="true" flushLockTimeout="15000" threadPoolSize="3" /> </loader> </loaders> <eviction wakeUpInterval="5000" maxEntries="-1" strategy="UNORDERED" /> <clustering mode="distribution"> <l1 enabled="true" lifespan="600000" /> <hash numOwners="2" /> <sync /> </clustering> <invocationBatching enabled="true" /> <transaction syncCommitPhase="true" syncRollbackPhase="true" transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup" useEagerLocking="true" /> </namedCache> </infinispan>