-
1. Re: Expiration and Lucene Directory
galder.zamarreno Jan 29, 2012 8:01 AM (in response to hvico)1 of 1 people found this helpfulRe: cache size
This is not currently doable. Calculating sizes of objects is not an easy task in the JVM.
-
2. Re: Expiration and Lucene Directory
hvico Feb 6, 2012 10:47 AM (in response to galder.zamarreno)Hi,
I am having trouble with my Lucene indexes tables, this tables keep growing indefinitely. I configured eviction so now I do not have out of memory errors, but what about expiration? My table grew from 400 Mbytes to 4 Gigabytes in a couple of weeks.
Thanks,
-
3. Re: Expiration and Lucene Directory
sannegrinovero Feb 6, 2012 10:55 AM (in response to hvico)Hi Horacio,
expiration sounds dangerous; Even if you would configure a week for expiry, it means that index segments or chunks which where created more than a week ago will be removed, even if they are still a required part of the index, I wouldn't recommend that, unless you have a very specific index usage for which you know that's not going to be a problem (like rebuilding the index every night).
4GB of Index is not unusual, so the real question is if this should be expected by your usage of it? Did you compare the index size to the same application using a filesystem index?
We do have some functional tests to guard against information leak in the lucene-directory module, but maybe it's not covering us well enough for your use case. Would you be able to create a testcase close to your use case? Please take a look into the sources for examples of tests, and feel free to ask for help. Even if you could send me a draft of a test that would be great.
-
4. Re: Expiration and Lucene Directory
hvico Feb 6, 2012 11:04 AM (in response to hvico)Hi Sanne,
Based on your experience, is it normal for a index to grow ten times its size, in an application where new documents are created at a really slow rate? In my application I have 40.000 documents, and in a week users generate less than 100 new documents. So, why the cachestore table grow from 400 Mbytes from a freshly reindexed state, to 4 gigabytes after a couple of weeks of mainly "read-only" usage? Is it a normal Lucene behaviour?
I did not have this kind of trouble using filesystem based indexed (without Infinispan). I test this, but would like to know your opinion about this numbers.
-
5. Re: Expiration and Lucene Directory
sannegrinovero Feb 6, 2012 11:27 AM (in response to hvico)No that doesn't look normal. Still, it might need to duplicate the size of some segments while it's re-writing it, and if you're not optimizing it it might be quite lazy in compacting.
So it is normal to need at least twice the average index size in terms of free space for intermediate works, but 10X seems very unlikely indeed.
Are you applying any IndexWriter tuning options?
-
6. Re: Expiration and Lucene Directory
hvico Feb 6, 2012 5:25 PM (in response to hvico)A good workaround would be to reindex from scratch at night.
I tried that approach, but after clearing the full text indexes via HSearch's API (FullTextSession purgeAll method), I do not get any deletions at my cachestore tables. It seems the cache keeps old indexes and it add the new ones on top of that (so the size problem increases). The only way I managed to make a fresh rebuild is following this steps:
1) Shutdown my cluster
2) Drop or truncate the cachestore tables via SQL
3) Start a cluster node (the tables are created at startup by Infinispan)
4) Rebuild indexes
5) Start the other nodes
That process is really uncomfortable, as it requires a full cluster shutdown. So maybe I am doing something wrong. Why the cachestore isn't cleaned when I purge all my entities? When I optimize my indexes I noticed the same behaviour, I do not see any reduction of the cachestore size.
Thanks,
-
7. Re: Expiration and Lucene Directory
sannegrinovero Feb 6, 2012 6:33 PM (in response to hvico)Hi Horacio,
could you please post both your Hibernate Search and Infinispan configuration files?
I need to reproduce your issue.
-
8. Re: Expiration and Lucene Directory
hvico Feb 7, 2012 5:46 AM (in response to hvico)Backend node (where new documents are generated and saved):
Backend node, infinispan.xml:
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:4.2 http://www.infinispan.org/schemas/infinispan-config-4.2.xsd"
xmlns="urn:infinispan:config:4.2">
<global>
<globalJmxStatistics enabled="false" cacheManagerName="HibernateSearch" allowDuplicateDomains="true" />
<transport clusterName="HibernateSearch-Infinispan-cluster" distributedSyncTimeout="50000">
<properties>
<property name="configurationFile" value="jgroups3.xml"/>
</properties>
</transport>
<shutdown hookBehavior="DONT_REGISTER" />
</global>
<default>
<locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />
<invocationBatching enabled="true" />
<jmxStatistics enabled="false" />
<eviction maxEntries="-1" strategy="NONE" />
<expiration maxIdle="-1" />
<clustering mode="replication">
<stateRetrieval timeout="60000" logFlushTimeout="65000" fetchInMemoryState="true" alwaysProvideInMemoryState="true" />
<sync replTimeout="50000" />
<l1 enabled="false" />
</clustering>
</default>
<namedCache name="LuceneIndexesLocking">
<clustering mode="replication">
<stateRetrieval fetchInMemoryState="true" logFlushTimeout="300000" />
<sync replTimeout="500000" />
<l1 enabled="false" />
</clustering>
<locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />
</namedCache>
<namedCache name="LuceneIndexesMetadata">
<clustering mode="replication">
<stateRetrieval fetchInMemoryState="true" logFlushTimeout="300000" />
<sync replTimeout="50000" />
<l1 enabled="false" />
</clustering>
<locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />
<loaders shared="true" preload="true">
<loader class="org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore" fetchPersistentState="true" ignoreModifications="false" purgeOnStartup="false">
<properties>
<property name="key2StringMapperClass" value="org.infinispan.lucene.LuceneKey2StringMapper" />
<property name="createTableOnStart" value="true" />
<property name="datasourceJndiLocation" value="java:/MyDatasource" />
<property name="connectionFactoryClass" value="org.infinispan.loaders.jdbc.connectionfactory.ManagedConnectionFactory" />
<property name="dataColumnType" value="BLOB" />
<property name="idColumnType" value="VARCHAR(256)" />
<property name="idColumnName" value="idCol" />
<property name="dataColumnName" value="dataCol" />
<property name="stringsTableNamePrefix" value="LuceneIndexesMetadata" />
<property name="timestampColumnName" value="timestampCol" />
<property name="timestampColumnType" value="BIGINT" />
</properties>
<async enabled="true" flushLockTimeout="2500" shutdownTimeout="7200" threadPoolSize="5" />
</loader>
</loaders>
</namedCache>
<namedCache name="LuceneIndexesData">
<clustering mode="replication">
<stateRetrieval fetchInMemoryState="true" logFlushTimeout="300000" />
<sync replTimeout="50000" />
<l1 enabled="false" />
</clustering>
<locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />
<loaders shared="true" preload="true" >
<loader class="org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore" fetchPersistentState="true" ignoreModifications="false" purgeOnStartup="false">
<properties>
<property name="key2StringMapperClass" value="org.infinispan.lucene.LuceneKey2StringMapper" />
<property name="createTableOnStart" value="true" />
<property name="datasourceJndiLocation" value="java:/MyDatasource" />
<property name="connectionFactoryClass" value="org.infinispan.loaders.jdbc.connectionfactory.ManagedConnectionFactory" />
<property name="dataColumnType" value="BLOB" />
<property name="idColumnType" value="VARCHAR(256)" />
<property name="idColumnName" value="idCol" />
<property name="dataColumnName" value="dataCol" />
<property name="stringsTableNamePrefix" value="LuceneIndexesData" />
<property name="timestampColumnName" value="timestampCol" />
<property name="timestampColumnType" value="BIGINT" />
</properties>
<async enabled="true" flushLockTimeout="2500" shutdownTimeout="7200" threadPoolSize="5" />
</loader>
</loaders>
<eviction maxEntries="8000" strategy="LIRS" wakeUpInterval="18000000" />
<expiration maxIdle="-1" />
</namedCache>
</infinispan>
Backend node, persistence.xml:
<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" version="1.0">
<persistence-unit name="BACKEND" transaction-type="JTA">
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<jta-data-source>java:/MyDatasource</jta-data-source>
<properties>
<property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect"/>
<property name="hibernate.hbm2ddl.auto" value="update"/>
<property name="hibernate.show_sql" value="false"/>
<property name="hibernate.format_sql" value="false"/>
<property name="hibernate.transaction.manager_lookup_class" value="org.hibernate.transaction.JBossTransactionManagerLookup"/>
<property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.EhCacheProvider"/>
<property name="hibernate.cache.use_query_cache" value="true"/>
<property name="hibernate.cache.use_second_level_cache" value="true"/>
<property name="hibernate.generate_statistics" value="true"/>
<property name="hibernate.cache.use_structured_entries" value="true"/>
<property name="hibernate.cache.provider_configuration_file_resource_path" value="/ehcache.xml" />
<property name="net.sf.ehcache.configurationResourceName" value="ehcache.xml"/>
<property name="hibernate.search.default.directory_provider" value="infinispan"/>
<property name="hibernate.search.default.chunk_size" value="65000"/>
<property name="hibernate.search.infinispan.cachemanager_jndiname" value="java:indexLucene"/>
<property name="hibernate.search.default.exclusive_index_use" value="true"/>
<property name="hibernate.search.default.optimizer.transaction_limit.max" value="100"/>
<property name="hibernate.search.default.optimizer.operation_limit.max" value = "500"/>
</properties>
</persistence-unit>
</persistence>
Cluster frontend nodes (3 nodes), read-only search application
infinispan.xml differences:
<namedCache name="LuceneIndexesMetadata">
<clustering mode="replication">
<stateRetrieval fetchInMemoryState="true" logFlushTimeout="300000" />
<sync replTimeout="50000" />
<l1 enabled="false" />
</clustering>
<locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />
</namedCache>
<namedCache name="LuceneIndexesData">
<clustering mode="replication">
<stateRetrieval fetchInMemoryState="false" logFlushTimeout="300000" />
<sync replTimeout="50000" />
<l1 enabled="false" />
</clustering>
<loaders shared="true" preload="true" >
<loader class="org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore" fetchPersistentState="true" ignoreModifications="true" purgeOnStartup="false">
<properties>
<property name="key2StringMapperClass" value="org.infinispan.lucene.LuceneKey2StringMapper" />
<property name="createTableOnStart" value="true" />
<property name="datasourceJndiLocation" value="java:/MyDatasource" />
<property name="connectionFactoryClass" value="org.infinispan.loaders.jdbc.connectionfactory.ManagedConnectionFactory" />
<property name="dataColumnType" value="BLOB" />
<property name="idColumnType" value="VARCHAR(256)" />
<property name="idColumnName" value="idCol" />
<property name="dataColumnName" value="dataCol" />
<property name="stringsTableNamePrefix" value="LuceneIndexesData" />
<property name="timestampColumnName" value="timestampCol" />
<property name="timestampColumnType" value="BIGINT" />
</properties>
<async enabled="true" flushLockTimeout="2500" shutdownTimeout="7200" threadPoolSize="5" />
</loader>
</loaders>
<locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />
<eviction maxEntries="8000" strategy="LIRS" wakeUpInterval="1800000" />
<expiration maxIdle="-1" />
</namedCache>
persistence.xml differences:
<persistence-unit name="FRONTEND" transaction-type="JTA">
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<jta-data-source>java:/MyDatasource</jta-data-source>
<properties>
<property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect"/>
<property name="hibernate.hbm2ddl.auto" value="update"/>
<property name="hibernate.show_sql" value="false"/>
<property name="hibernate.format_sql" value="true"/>
<property name="hibernate.transaction.manager_lookup_class" value="org.hibernate.transaction.JBossTransactionManagerLookup"/>
<property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.EhCacheProvider"/>
<property name="hibernate.cache.use_query_cache" value="true"/>
<property name="hibernate.cache.use_second_level_cache" value="true"/>
<property name="hibernate.generate_statistics" value="true"/>
<property name="hibernate.cache.use_structured_entries" value="true"/>
<property name="hibernate.cache.provider_configuration_file_resource_path" value="/ehcache.xml" />
<property name="net.sf.ehcache.configurationResourceName" value="ehcache.xml"/>
<property name="hibernate.search.worker.backend" value="jgroupsSlave"/>
<property name="hibernate.search.default.exclusive_index_use" value="true"/>
<property name="hibernate.search.default.optimizer.operation_limit.max" value="1000"/>
<property name="hibernate.search.default.directory_provider" value="infinispan"/>
<property name="hibernate.search.default.chunk_size" value="65000"/>
<property name="hibernate.search.infinispan.cachemanager_jndiname" value="java:indexLucene"/>
</properties>
</persistence-unit>
Thanks for your interest!
-
9. Re: Expiration and Lucene Directory
sannegrinovero Feb 7, 2012 6:02 AM (in response to hvico)Thanks. One more question: which versions of Hibernate Search and Infinispan?
-
10. Re: Expiration and Lucene Directory
hvico Feb 7, 2012 6:05 AM (in response to sannegrinovero)HSearch 3.4.1.FINAL and Infinispan 4.2.1-Final
-
11. Re: Expiration and Lucene Directory
hvico Feb 7, 2012 6:24 AM (in response to hvico)Looking at this property which is set in both backend and frontend nodes:
<property name="hibernate.search.default.exclusive_index_use" value="true"/>
Can I use that exclusive index considering my architecture (backend "writer" and frontend "readers")
-
12. Re: Expiration and Lucene Directory
sannegrinovero Feb 7, 2012 6:50 AM (in response to hvico)Yes that works fine as long as you have a single node writing.
I might have already asked you.. no way you can try Search 4.1 and Infinispan 5.1?
-
13. Re: Expiration and Lucene Directory
hvico Feb 7, 2012 6:58 AM (in response to hvico)Unfortunatelly no, as the upgrade matrix is too complex and do not have resources for that kind of project right now.
My project is built over SEAM 2 and Richfaces 3, and my JBoss AS is a 4.2.
I would like to create a batch process to purge and rebuild indexes every night, but I should find a way to "compact" or clean the cachestore before the reindex/optimization process.
-
14. Re: Expiration and Lucene Directory
hvico Feb 8, 2012 7:53 AM (in response to hvico)Maybe this information provides some light:
Yesterday I ran a full index rebuild. My cachestore table had 6000 rows aprox.
Today, after some usage it has 18.000 rows.
Querying the index table by SQL, and looking at the "idCol" column I noticed that 11.000 of that rows are Lucene "prx" files:
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/fileformats.html#Positions
When I run a FullTextSession.getSearchFactory().optimize() those "files" remain untouched.
Hope this helps!