10 Replies Latest reply on Aug 9, 2012 12:37 AM by mkg

    Infinispan - Any ideas to make indexing faster.

    mkg
      • I am trying out Infinispan cache to store java objects in local cache only mode.
      • I want to query on both keys as well as some fields. So I am using query/indexing module of Infinispan.
      • Lookup performance is very good on indexed fields. However loading all the items in cache is taking a huge amount of time compared to w/o indexing.
      • e.g. for around 50k Objects, with Indexing Infinispan took 10 minutes to load the items in cache. Without indexing, it took only 2 sec.
      • I wonder if Infinispan is this slow after indexing or if I am doing something grossly wrong. Any pointers to make indexing faster will be helpful.

       

       

       

      Configuration infinispanConfiguration = new ConfigurationBuilder()
            .indexing()
               .enable()
               .indexLocalOnly(true)
            .build();
      
      DefaultCacheManager cacheManager = new DefaultCacheManager(infinispanConfiguration)
      
      
      
      
      @Indexed @ProvidedId
          public class Book {
             @Field String title;
             @Field String description;
             @Field String author;
             @Field int yearOfPublication ;
             String briefDescription;
             int edition;
             boolean isBestSeller;
          }
      
      
        • 1. Re: Infinispan - Any ideas to make indexing faster.
          sannegrinovero

          Hi,

          I've recently improved the indexing performance in the Hibernate Search project, but I didn't test yet if the performance boost affects the integration layer Infinispan Query / Hibernate Search engine.

           

          My coding goal for the week of 23th April is to enable the same multithreaded massindexer available in Hibernate Search to Infinispan users, so if you have a test you can share I'll be glad to use it and make sure your use case will index as possible.

           

          If you're using an older version, consider upgrading as many optimisations were added recently and are enabled transparently (no need for configuration changes).

           

          Indexing is always going to be an expensive operation; depending on your analysers, object complexity and text content it's possible that we won't be able to improve your 10 minutes, but it's always fun trying .

           

          Consider looking into your GC statistics as well, indexing is very demanding in terms of memory consumption: you might get a significant boost tuning your JVM settings.

          1 of 1 people found this helpful
          • 2. Re: Infinispan - Any ideas to make indexing faster.
            sannegrinovero

            and check the obvious first:

            1. http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#lucene-indexing-performance

             

            It's always worth setting ram_buffer_size to an higher value, if you have enough memory for it.

            1 of 1 people found this helpful
            • 3. Re: Infinispan - Any ideas to make indexing faster.
              mkg

              Sanne - thanks for your inputs.

              • i am using the latest version of hiberant-search (4.0 - FINAL).
              • Following are the GC logs during loading of cache. I have 8Gig allocated and there is not a single full GC during loading of data. I have increase ram_buffer_size to 1Gig.
              main 2012-04-11 12:33:10,526 INFO [eq.rds2.cache.InfinispanTest] Starting loading cache
              main 2012-04-11 12:33:10,540 WARN [hibernate.search.impl.ConfigContext] HSEARCH000075: Configuration setting hibernate.search.lucene_version was not specified, using LUCENE_CURRENT.
              main 2012-04-11 12:33:10,933 INFO [serialization.avro.impl.AvroSerializationProvider] HSEARCH000079: Serialization protocol version 1.0
              [GC [PSYoungGen: 2048000K->3041K(2389312K)] 2048000K->3041K(7850688K), 0.0145010 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
              [GC [PSYoungGen: 2051041K->3601K(2389312K)] 2051041K->3601K(7850688K), 0.0125610 secs] [Times: user=0.04 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2051601K->4135K(2389312K)] 2051601K->4135K(7850688K), 0.0098240 secs] [Times: user=0.03 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2052135K->4688K(2389312K)] 2052135K->4688K(7850688K), 0.0119260 secs] [Times: user=0.03 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2052688K->6014K(2389312K)] 2052688K->6014K(7850688K), 0.0118120 secs] [Times: user=0.04 sys=0.00, real=0.02 secs]
              [GC [PSYoungGen: 2054014K->5753K(2723264K)] 2054014K->5753K(8184640K), 0.0147400 secs] [Times: user=0.04 sys=0.01, real=0.02 secs]
              [GC [PSYoungGen: 2722681K->3036K(2723776K)] 2722681K->6434K(8185152K), 0.0203840 secs] [Times: user=0.05 sys=0.01, real=0.02 secs]
              [GC [PSYoungGen: 2719964K->2780K(2723776K)] 2723362K->7226K(8185152K), 0.0090730 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
              [GC [PSYoungGen: 2719708K->3131K(2723776K)] 2724154K->8657K(8185152K), 0.0082410 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2720059K->2411K(2723904K)] 2725585K->9413K(8185280K), 0.0085340 secs] [Times: user=0.03 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2719595K->1632K(2718848K)] 2726597K->10113K(8180224K), 0.0079290 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2718816K->1897K(2723776K)] 2727297K->11153K(8185152K), 0.0075690 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2718825K->2048K(2723776K)] 2728081K->12298K(8185152K), 0.0067570 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2718976K->1760K(2723968K)] 2729226K->13210K(8185344K), 0.0077100 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2719008K->1680K(2723904K)] 2730458K->13974K(8185280K), 0.0060100 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2718928K->1648K(2724160K)] 2731222K->14742K(8185536K), 0.0066520 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2719216K->1917K(2724032K)] 2732310K->15775K(8185408K), 0.0075170 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2719485K->896K(2724416K)] 2733343K->15714K(8185792K), 0.0053720 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2719040K->864K(2724352K)] 2733858K->16370K(8185728K), 0.0056280 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
              [GC [PSYoungGen: 2719008K->768K(2724544K)] 2734514K->16946K(8185920K), 0.0051120 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2719168K->864K(2724480K)] 2735346K->17602K(8185856K), 0.0081980 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
              [GC [PSYoungGen: 2719264K->800K(2724864K)] 2736002K->18194K(8186240K), 0.0062300 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
              [GC [PSYoungGen: 2719712K->1728K(2724672K)] 2737106K->19739K(8186048K), 0.0058810 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2720640K->1968K(2725504K)] 2738651K->20823K(8186880K), 0.0093840 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2722032K->1955K(2725184K)] 2740887K->21922K(8186560K), 0.0073390 secs] [Times: user=0.03 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2722019K->1600K(2726016K)] 2741986K->22694K(8187392K), 0.0084130 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2722816K->1616K(2725824K)] 2743910K->23466K(8187200K), 0.0063060 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2722832K->1600K(2726464K)] 2744682K->24187K(8187840K), 0.0068270 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2723712K->1632K(2726272K)] 2746299K->24939K(8187648K), 0.0063030 secs] [Times: user=0.02 sys=0.01, real=0.00 secs]
              [GC [PSYoungGen: 2723744K->1584K(2726784K)] 2747051K->25614K(8188160K), 0.0063850 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2724400K->1488K(2726656K)] 2748430K->26243K(8188032K), 0.0061930 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2724304K->1584K(2727104K)] 2749059K->27002K(8188480K), 0.0058790 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2725040K->864K(2726976K)] 2750458K->27006K(8188352K), 0.0055540 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2724320K->864K(2727168K)] 2750462K->27670K(8188544K), 0.0054860 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
              [GC [PSYoungGen: 2724512K->736K(2727104K)] 2751318K->28222K(8188480K), 0.0050990 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2724384K->768K(2727232K)] 2751870K->28806K(8188608K), 0.0046180 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2724608K->1227K(2727232K)] 2752646K->29802K(8188608K), 0.0039870 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2725067K->736K(2727488K)] 2753642K->30306K(8188864K), 0.0068510 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2725024K->736K(2727424K)] 2754594K->30842K(8188800K), 0.0046310 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
              [GC [PSYoungGen: 2725024K->704K(2727616K)] 2755130K->31330K(8188992K), 0.0078330 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2725248K->1232K(2727552K)] 2755874K->32378K(8188928K), 0.0067230 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2725776K->1440K(2727872K)] 2756922K->33166K(8189248K), 0.0076770 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2726496K->1568K(2727808K)] 2758222K->33878K(8189184K), 0.0091210 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              [GC [PSYoungGen: 2726624K->896K(2728000K)] 2758934K->33902K(8189376K), 0.0059810 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
              [GC [PSYoungGen: 2726208K->1552K(2727936K)] 2759214K->35231K(8189312K), 0.0087600 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
              [GC [PSYoungGen: 2726864K->1328K(2728128K)] 2760543K->35703K(8189504K), 0.0059070 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
              [GC [PSYoungGen: 2726832K->2017K(2727552K)] 2761207K->37007K(8188928K), 0.0049500 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
              [GC [PSYoungGen: 2727521K->1312K(2728064K)] 2762511K->37418K(8189440K), 0.0077900 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
              main 2012-04-11 12:42:28,103 INFO [eq.rds2.cache.InfinispanTest] Finished loading cache
              
              

               

              • However did not notice any improvement in time. (10 mins).
              • My usecase is simple. I have a list of 50k objects and I am loading all these in cache at the application startup. (a bulk load kind of thing).
              • I will be happy to try out the maxindexer feature that you are planning to release.
              • 5. Re: Infinispan - Any ideas to make indexing faster.
                sannegrinovero

                Did you try any of the tuning options?

                • 6. Re: Infinispan - Any ideas to make indexing faster.
                  sarav.ks

                  Hi,

                   

                  I am also having the same problem with loading the data with Index enabled . Today i downloaded 5.1.4 , with the below config

                   


                  <property name="hibernate.search.default.directory_provider" value="filesystem" />

                  <property name="hibernate.search.default.indexBase" value="C:\\Temp" />

                  <property name="hibernate.search.default.exclusive_index_use" value="true" />

                  <property name="hibernate.search.default.indexwriter.use_compound_file" value="false" />

                  <property name="hibernate.search.default.indexwriter.ram_buffer_size" value="500" />

                   

                  It took 19 min to load 5000 records , my object is not very big , it has 20 fields (out of which around 10 are objects and other primitives)

                   

                  @Indexed

                  @ProvidedId

                  public class Inventory implements Serializable {

                   

                       @Field

                      @NumericField

                      private long assetId;

                  ....

                   

                  Because the Time taken is way high , i think i am missing some configuration. I would appriciate if you can validate my setup . Thanks

                   

                  Also, i thought the index files once created will be re-used when the system is re-started , but it turns out that every time i re-start my system , it takes 19 - 20 min to load 5k records.

                  • 7. Re: Infinispan - Any ideas to make indexing faster.
                    mkg

                    Hi Sanne

                    Were you able to release the multithreaded massindexer feature that you mentioned. Please let me know the version if it is in a usable state.

                    Infinispan is great for our usecase with the only bottleneck being the indexing time because it will increase the startup time of our application.

                    • 8. Re: Infinispan - Any ideas to make indexing faster.
                      sannegrinovero

                      Hi all,

                      some more tricks to speedup indexing:

                      1. Use Near-Real-Time when possible http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#d0e843
                      2. Try a reasonable number of shards: not too many, for example it's unlikely to be useful to have more shards than CPU cores http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#search-configuration-directory-sharding
                      3. Look at other options, such as merge_factor http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#lucene-indexing-performance
                      4. Storing the index on filesystem will always be slower than storing it in Infinispan by configuring an Infinispan Directory (although some CacheStores in Infinispan might be slower than direct disk storage).

                       

                       

                      MKG kumar wrote:

                       

                      Were you able to release the multithreaded massindexer feature that you mentioned. Please let me know the version if it is in a usable state.

                      Infinispan is great for our usecase with the only bottleneck being the indexing time because it will increase the startup time of our application.

                      No sorry, that wasn't done yet. thanks for the reminder!

                      • 9. Re: Infinispan - Any ideas to make indexing faster.
                        sarav.ks

                        Is there any update on the multithreaded massindexer feature ? We are not able to use the cache because we are not able to index our data at the startup in a reasonable amount of time.

                        • 10. Re: Infinispan - Any ideas to make indexing faster.
                          mkg

                          because of the slow indexing only, we are also not able to use the Infinispan. We have millions of objects to index. With 50k objects only, it takes 10 mins to index.