1 2 Previous Next 25 Replies Latest reply on Feb 24, 2012 10:56 AM by sannegrinovero

    Expiration and Lucene Directory

    hvico

      Hi!,

       

      Yesterday we were discussing in another thread about eviction using Infinispan as a Lucene Directory (I didn't configure eviction and got out of memory errors in my Lucene app).

       

      Now I ask about expiration, because the row count in my cachestore table keeps growing (now I suppose memory is limited by the eviction process).

       

      Could I configure expiration for an Infinispan cache which is working as a Lucene directory, or is this dangerous (possible index losses)? I was thinking about configuring this with a long period of time, like "30 days" or something.

       

      And another little question. Is it possible to retrieve an infinispan cache size in kilobytes or something? I use the size method that the API provides, but I get the number of keys in the cache and not its size.

       

      Thanks,

        • 1. Re: Expiration and Lucene Directory
          galder.zamarreno

          Re: cache size

           

          This is not currently doable. Calculating sizes of objects is not an easy task in the JVM.

          1 of 1 people found this helpful
          • 2. Re: Expiration and Lucene Directory
            hvico

            Hi,

             

            I am having trouble with my Lucene indexes tables, this tables keep growing indefinitely. I configured eviction so now I do not have out of memory errors, but what about expiration? My table grew from 400 Mbytes to 4 Gigabytes in a couple of weeks. 

             

            Thanks,

            • 3. Re: Expiration and Lucene Directory
              sannegrinovero

              Hi Horacio,

              expiration sounds dangerous; Even if you would configure a week for expiry, it means that index segments or chunks which where created more than a week ago will be removed, even if they are still a required part of the index, I wouldn't recommend that, unless you have a very specific index usage for which you know that's not going to be a problem (like rebuilding the index every night).

               

              4GB of Index is not unusual, so the real question is if this should be expected by your usage of it? Did you compare the index size to the same application using a filesystem index?

               

              We do have some functional tests to guard against information leak in the lucene-directory module, but maybe it's not covering us well enough for your use case. Would you be able to create a testcase close to your use case? Please take a look into the sources for examples of tests, and feel free to ask for help. Even if you could send me a draft of a test that would be great.

              • 4. Re: Expiration and Lucene Directory
                hvico

                Hi Sanne,

                 

                Based on your experience, is it normal for a index to grow ten times its size, in an application where new documents are created at a really slow rate? In my application I have 40.000 documents, and in a week users generate less than 100 new documents. So, why the cachestore table grow from 400 Mbytes from a freshly reindexed state, to 4 gigabytes after a couple of weeks of mainly "read-only" usage? Is it a normal Lucene behaviour?

                 

                I did not have this kind of trouble using filesystem based indexed (without Infinispan). I test this, but would like to know your opinion about this numbers.

                • 5. Re: Expiration and Lucene Directory
                  sannegrinovero

                  No that doesn't look normal. Still, it might need to duplicate the size of some segments while it's re-writing it, and if you're not optimizing it it might be quite lazy in compacting.

                   

                  So it is normal to need at least twice the average index size in terms of free space for intermediate works, but 10X seems very unlikely indeed.

                   

                  Are you applying any IndexWriter tuning options?

                  • 6. Re: Expiration and Lucene Directory
                    hvico

                    A good workaround would be to reindex from scratch at night.

                     

                    I tried that approach, but after clearing the full text indexes via HSearch's API (FullTextSession purgeAll method), I do not get any deletions at my cachestore tables. It seems the cache keeps old indexes and it add the new ones on top of that (so the size problem increases). The only way I managed to make a fresh rebuild is following this steps:

                     

                    1) Shutdown my cluster

                    2) Drop or truncate the cachestore tables via SQL

                    3) Start a cluster node (the tables are created at startup by Infinispan)

                    4) Rebuild indexes

                    5) Start the other nodes

                     

                    That process is really uncomfortable, as it requires a full cluster shutdown.  So maybe I am doing something wrong. Why the cachestore isn't cleaned when I purge all my entities? When I optimize my indexes I noticed the same behaviour, I do not see any reduction of the cachestore size.

                     

                    Thanks,

                    • 7. Re: Expiration and Lucene Directory
                      sannegrinovero

                      Hi Horacio,

                      could you please post both your Hibernate Search and Infinispan configuration files?

                      I need to reproduce your issue.

                      • 8. Re: Expiration and Lucene Directory
                        hvico

                        Backend node (where new documents are generated and saved):

                         

                        Backend node, infinispan.xml:

                         

                        <?xml version="1.0" encoding="UTF-8"?>

                        <infinispan

                            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                            xsi:schemaLocation="urn:infinispan:config:4.2 http://www.infinispan.org/schemas/infinispan-config-4.2.xsd"

                            xmlns="urn:infinispan:config:4.2">

                          <global>

                                <globalJmxStatistics enabled="false" cacheManagerName="HibernateSearch" allowDuplicateDomains="true" />

                                <transport clusterName="HibernateSearch-Infinispan-cluster"  distributedSyncTimeout="50000">

                                    <properties>

                                        <property name="configurationFile" value="jgroups3.xml"/>

                                    </properties>

                                </transport>

                                <shutdown  hookBehavior="DONT_REGISTER" />

                          </global>

                        <default>

                                <locking lockAcquisitionTimeout="20000" writeSkewCheck="false"  concurrencyLevel="5000" useLockStriping="false" />

                                <invocationBatching enabled="true" />

                                <jmxStatistics enabled="false" />

                                <eviction maxEntries="-1" strategy="NONE" />

                                <expiration maxIdle="-1" />

                                            <clustering mode="replication">

                                    <stateRetrieval timeout="60000"  logFlushTimeout="65000" fetchInMemoryState="true" alwaysProvideInMemoryState="true" />

                                    <sync replTimeout="50000" />

                                    <l1 enabled="false" />

                                </clustering>

                        </default>

                          <namedCache name="LuceneIndexesLocking">

                                <clustering mode="replication">

                                    <stateRetrieval fetchInMemoryState="true" logFlushTimeout="300000" />

                                    <sync replTimeout="500000" />

                                    <l1 enabled="false" />

                                </clustering>

                                <locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />

                            </namedCache>

                        <namedCache name="LuceneIndexesMetadata">

                                <clustering mode="replication">

                                    <stateRetrieval fetchInMemoryState="true" logFlushTimeout="300000" />

                                    <sync replTimeout="50000" />

                                    <l1 enabled="false" />

                                </clustering>

                                <locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />

                              <loaders shared="true" preload="true">

                                 <loader class="org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore" fetchPersistentState="true" ignoreModifications="false" purgeOnStartup="false">

                                    <properties>

                                       <property name="key2StringMapperClass" value="org.infinispan.lucene.LuceneKey2StringMapper" />

                                       <property name="createTableOnStart" value="true" />

                                       <property name="datasourceJndiLocation" value="java:/MyDatasource" />

                                       <property name="connectionFactoryClass" value="org.infinispan.loaders.jdbc.connectionfactory.ManagedConnectionFactory" />

                                       <property name="dataColumnType" value="BLOB" />

                                       <property name="idColumnType" value="VARCHAR(256)" />

                                       <property name="idColumnName" value="idCol" />

                                       <property name="dataColumnName" value="dataCol" />

                                       <property name="stringsTableNamePrefix" value="LuceneIndexesMetadata" />

                                       <property name="timestampColumnName" value="timestampCol" />

                                       <property name="timestampColumnType" value="BIGINT" />

                                    </properties>

                                    <async enabled="true" flushLockTimeout="2500" shutdownTimeout="7200" threadPoolSize="5" />

                                 </loader>

                              </loaders>

                           </namedCache>

                           <namedCache name="LuceneIndexesData">

                                <clustering mode="replication">

                                    <stateRetrieval fetchInMemoryState="true" logFlushTimeout="300000" />

                                    <sync replTimeout="50000" />

                                    <l1 enabled="false" />

                                </clustering>

                                <locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />

                           <loaders shared="true" preload="true" >

                                 <loader class="org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore" fetchPersistentState="true" ignoreModifications="false"  purgeOnStartup="false">

                                    <properties>

                                       <property name="key2StringMapperClass" value="org.infinispan.lucene.LuceneKey2StringMapper" />

                                       <property name="createTableOnStart" value="true" />

                                       <property name="datasourceJndiLocation" value="java:/MyDatasource" />

                                       <property name="connectionFactoryClass" value="org.infinispan.loaders.jdbc.connectionfactory.ManagedConnectionFactory" />

                                       <property name="dataColumnType" value="BLOB" />

                                       <property name="idColumnType" value="VARCHAR(256)" />

                                       <property name="idColumnName" value="idCol" />

                                       <property name="dataColumnName" value="dataCol" />

                                       <property name="stringsTableNamePrefix" value="LuceneIndexesData" />

                                       <property name="timestampColumnName" value="timestampCol" />

                                       <property name="timestampColumnType" value="BIGINT" />

                                    </properties>

                                    <async enabled="true" flushLockTimeout="2500" shutdownTimeout="7200" threadPoolSize="5" />

                                 </loader>

                              </loaders>

                              <eviction maxEntries="8000" strategy="LIRS" wakeUpInterval="18000000" />  

                              <expiration maxIdle="-1" />

                           </namedCache>

                        </infinispan>

                         

                        Backend node, persistence.xml:

                         

                        <?xml version="1.0" encoding="UTF-8"?>

                        <persistence xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" version="1.0">        

                           <persistence-unit name="BACKEND" transaction-type="JTA">

                              <provider>org.hibernate.ejb.HibernatePersistence</provider>

                              <jta-data-source>java:/MyDatasource</jta-data-source>

                              <properties>

                                 <property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect"/>

                                 <property name="hibernate.hbm2ddl.auto" value="update"/>

                                 <property name="hibernate.show_sql" value="false"/>

                                 <property name="hibernate.format_sql" value="false"/>

                                 <property name="hibernate.transaction.manager_lookup_class" value="org.hibernate.transaction.JBossTransactionManagerLookup"/>

                                             <property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.EhCacheProvider"/>

                                             <property name="hibernate.cache.use_query_cache" value="true"/>

                                             <property name="hibernate.cache.use_second_level_cache" value="true"/>

                                             <property name="hibernate.generate_statistics" value="true"/>

                                             <property name="hibernate.cache.use_structured_entries" value="true"/>

                                             <property name="hibernate.cache.provider_configuration_file_resource_path" value="/ehcache.xml" />

                                             <property name="net.sf.ehcache.configurationResourceName" value="ehcache.xml"/> 

                                             <property name="hibernate.search.default.directory_provider" value="infinispan"/>

                                             <property name="hibernate.search.default.chunk_size" value="65000"/>

                                             <property name="hibernate.search.infinispan.cachemanager_jndiname" value="java:indexLucene"/> 

                                             <property name="hibernate.search.default.exclusive_index_use" value="true"/>

                                             <property name="hibernate.search.default.optimizer.transaction_limit.max" value="100"/>

                                             <property name="hibernate.search.default.optimizer.operation_limit.max" value = "500"/>                  

                                  </properties>

                        </persistence-unit>

                        </persistence>

                         

                         

                        Cluster frontend nodes (3 nodes), read-only search application

                         

                        infinispan.xml differences:

                         

                        <namedCache name="LuceneIndexesMetadata">

                                <clustering mode="replication">

                                    <stateRetrieval fetchInMemoryState="true" logFlushTimeout="300000" />

                                    <sync replTimeout="50000" />

                                    <l1 enabled="false" />

                                </clustering>

                                <locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />

                           </namedCache>

                            <namedCache name="LuceneIndexesData">

                               <clustering mode="replication">

                                    <stateRetrieval fetchInMemoryState="false" logFlushTimeout="300000" />

                                    <sync replTimeout="50000" />

                                    <l1 enabled="false" />

                                </clustering>

                                  <loaders shared="true" preload="true" >

                                 <loader class="org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore"  fetchPersistentState="true" ignoreModifications="true" purgeOnStartup="false">

                                    <properties>

                                       <property name="key2StringMapperClass" value="org.infinispan.lucene.LuceneKey2StringMapper" />

                                       <property name="createTableOnStart" value="true" />

                                       <property name="datasourceJndiLocation" value="java:/MyDatasource" />

                                       <property name="connectionFactoryClass" value="org.infinispan.loaders.jdbc.connectionfactory.ManagedConnectionFactory" />

                                       <property name="dataColumnType" value="BLOB" />

                                       <property name="idColumnType" value="VARCHAR(256)" />

                                       <property name="idColumnName" value="idCol" />

                                       <property name="dataColumnName" value="dataCol" />

                                       <property name="stringsTableNamePrefix" value="LuceneIndexesData" />

                                       <property name="timestampColumnName" value="timestampCol" />

                                       <property name="timestampColumnType" value="BIGINT" />

                                    </properties>

                                    <async enabled="true" flushLockTimeout="2500" shutdownTimeout="7200" threadPoolSize="5" />

                                 </loader>

                              </loaders>

                                <locking lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="5000" useLockStriping="false" />

                                  <eviction maxEntries="8000" strategy="LIRS" wakeUpInterval="1800000" />

                                <expiration maxIdle="-1" />

                        </namedCache>

                         

                        persistence.xml differences:

                         

                        <persistence-unit name="FRONTEND" transaction-type="JTA">

                              <provider>org.hibernate.ejb.HibernatePersistence</provider>

                              <jta-data-source>java:/MyDatasource</jta-data-source>

                               <properties>

                                 <property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect"/>

                                 <property name="hibernate.hbm2ddl.auto" value="update"/>

                                 <property name="hibernate.show_sql" value="false"/>

                                 <property name="hibernate.format_sql" value="true"/>

                                 <property name="hibernate.transaction.manager_lookup_class" value="org.hibernate.transaction.JBossTransactionManagerLookup"/>

                                 <property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.EhCacheProvider"/>

                                 <property name="hibernate.cache.use_query_cache" value="true"/>

                                       <property name="hibernate.cache.use_second_level_cache" value="true"/>

                                       <property name="hibernate.generate_statistics" value="true"/>

                                       <property name="hibernate.cache.use_structured_entries" value="true"/>

                                       <property name="hibernate.cache.provider_configuration_file_resource_path" value="/ehcache.xml" />            

                                       <property name="net.sf.ehcache.configurationResourceName" value="ehcache.xml"/>

                                       <property name="hibernate.search.worker.backend" value="jgroupsSlave"/>

                                       <property name="hibernate.search.default.exclusive_index_use" value="true"/>

                                       <property name="hibernate.search.default.optimizer.operation_limit.max" value="1000"/>        

                                 <property name="hibernate.search.default.directory_provider" value="infinispan"/>

                                 <property name="hibernate.search.default.chunk_size" value="65000"/>

                                 <property name="hibernate.search.infinispan.cachemanager_jndiname" value="java:indexLucene"/>

                              </properties>

                           </persistence-unit>

                         

                         

                        Thanks for your interest!

                        • 9. Re: Expiration and Lucene Directory
                          sannegrinovero

                          Thanks. One more question: which versions of Hibernate Search and  Infinispan?

                          • 10. Re: Expiration and Lucene Directory
                            hvico

                            HSearch 3.4.1.FINAL and Infinispan 4.2.1-Final

                            • 11. Re: Expiration and Lucene Directory
                              hvico

                              Looking at this property which is set in both backend and frontend nodes:

                               

                                   <property name="hibernate.search.default.exclusive_index_use" value="true"/>

                               

                              Can I use that exclusive index considering my architecture (backend "writer" and frontend "readers")

                              • 12. Re: Expiration and Lucene Directory
                                sannegrinovero

                                Yes that works fine as long as you have a single node writing.

                                 

                                I might have already asked you.. no way you can try Search 4.1 and Infinispan 5.1?

                                • 13. Re: Expiration and Lucene Directory
                                  hvico

                                  Unfortunatelly no, as the upgrade matrix is too complex and do not have resources for that kind of project right now.

                                   

                                  My project is built over SEAM 2 and Richfaces 3, and my JBoss AS is a 4.2.

                                   

                                  I would like to create a batch process to purge and rebuild indexes every night, but I should find a way to "compact" or clean the cachestore before the reindex/optimization process.

                                  • 14. Re: Expiration and Lucene Directory
                                    hvico

                                    Maybe this information provides some light:

                                     

                                    Yesterday I ran a full index rebuild. My cachestore table had 6000 rows aprox.

                                    Today, after some usage it has 18.000 rows.

                                     

                                    Querying the index table by SQL, and looking at the "idCol" column I noticed that 11.000 of that rows are Lucene "prx" files:

                                     

                                    http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/fileformats.html#Positions

                                     

                                    When I run a FullTextSession.getSearchFactory().optimize() those "files" remain untouched.

                                     

                                    Hope this helps!

                                    1 2 Previous Next