-
1. Re: Infinispan - Any ideas to make indexing faster.
sannegrinovero Apr 11, 2012 5:34 AM (in response to mkg)1 of 1 people found this helpfulHi,
I've recently improved the indexing performance in the Hibernate Search project, but I didn't test yet if the performance boost affects the integration layer Infinispan Query / Hibernate Search engine.
My coding goal for the week of 23th April is to enable the same multithreaded massindexer available in Hibernate Search to Infinispan users, so if you have a test you can share I'll be glad to use it and make sure your use case will index as possible.
If you're using an older version, consider upgrading as many optimisations were added recently and are enabled transparently (no need for configuration changes).
Indexing is always going to be an expensive operation; depending on your analysers, object complexity and text content it's possible that we won't be able to improve your 10 minutes, but it's always fun trying .
Consider looking into your GC statistics as well, indexing is very demanding in terms of memory consumption: you might get a significant boost tuning your JVM settings.
-
2. Re: Infinispan - Any ideas to make indexing faster.
sannegrinovero Apr 11, 2012 5:38 AM (in response to sannegrinovero)1 of 1 people found this helpfuland check the obvious first:
It's always worth setting ram_buffer_size to an higher value, if you have enough memory for it.
-
3. Re: Infinispan - Any ideas to make indexing faster.
mkg Apr 11, 2012 7:51 AM (in response to sannegrinovero)Sanne - thanks for your inputs.
- i am using the latest version of hiberant-search (4.0 - FINAL).
- Following are the GC logs during loading of cache. I have 8Gig allocated and there is not a single full GC during loading of data. I have increase ram_buffer_size to 1Gig.
main 2012-04-11 12:33:10,526 INFO [eq.rds2.cache.InfinispanTest] Starting loading cache main 2012-04-11 12:33:10,540 WARN [hibernate.search.impl.ConfigContext] HSEARCH000075: Configuration setting hibernate.search.lucene_version was not specified, using LUCENE_CURRENT. main 2012-04-11 12:33:10,933 INFO [serialization.avro.impl.AvroSerializationProvider] HSEARCH000079: Serialization protocol version 1.0 [GC [PSYoungGen: 2048000K->3041K(2389312K)] 2048000K->3041K(7850688K), 0.0145010 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] [GC [PSYoungGen: 2051041K->3601K(2389312K)] 2051041K->3601K(7850688K), 0.0125610 secs] [Times: user=0.04 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2051601K->4135K(2389312K)] 2051601K->4135K(7850688K), 0.0098240 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2052135K->4688K(2389312K)] 2052135K->4688K(7850688K), 0.0119260 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2052688K->6014K(2389312K)] 2052688K->6014K(7850688K), 0.0118120 secs] [Times: user=0.04 sys=0.00, real=0.02 secs] [GC [PSYoungGen: 2054014K->5753K(2723264K)] 2054014K->5753K(8184640K), 0.0147400 secs] [Times: user=0.04 sys=0.01, real=0.02 secs] [GC [PSYoungGen: 2722681K->3036K(2723776K)] 2722681K->6434K(8185152K), 0.0203840 secs] [Times: user=0.05 sys=0.01, real=0.02 secs] [GC [PSYoungGen: 2719964K->2780K(2723776K)] 2723362K->7226K(8185152K), 0.0090730 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] [GC [PSYoungGen: 2719708K->3131K(2723776K)] 2724154K->8657K(8185152K), 0.0082410 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2720059K->2411K(2723904K)] 2725585K->9413K(8185280K), 0.0085340 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2719595K->1632K(2718848K)] 2726597K->10113K(8180224K), 0.0079290 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2718816K->1897K(2723776K)] 2727297K->11153K(8185152K), 0.0075690 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2718825K->2048K(2723776K)] 2728081K->12298K(8185152K), 0.0067570 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2718976K->1760K(2723968K)] 2729226K->13210K(8185344K), 0.0077100 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2719008K->1680K(2723904K)] 2730458K->13974K(8185280K), 0.0060100 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2718928K->1648K(2724160K)] 2731222K->14742K(8185536K), 0.0066520 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2719216K->1917K(2724032K)] 2732310K->15775K(8185408K), 0.0075170 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2719485K->896K(2724416K)] 2733343K->15714K(8185792K), 0.0053720 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2719040K->864K(2724352K)] 2733858K->16370K(8185728K), 0.0056280 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] [GC [PSYoungGen: 2719008K->768K(2724544K)] 2734514K->16946K(8185920K), 0.0051120 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2719168K->864K(2724480K)] 2735346K->17602K(8185856K), 0.0081980 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] [GC [PSYoungGen: 2719264K->800K(2724864K)] 2736002K->18194K(8186240K), 0.0062300 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] [GC [PSYoungGen: 2719712K->1728K(2724672K)] 2737106K->19739K(8186048K), 0.0058810 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2720640K->1968K(2725504K)] 2738651K->20823K(8186880K), 0.0093840 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2722032K->1955K(2725184K)] 2740887K->21922K(8186560K), 0.0073390 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2722019K->1600K(2726016K)] 2741986K->22694K(8187392K), 0.0084130 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2722816K->1616K(2725824K)] 2743910K->23466K(8187200K), 0.0063060 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2722832K->1600K(2726464K)] 2744682K->24187K(8187840K), 0.0068270 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2723712K->1632K(2726272K)] 2746299K->24939K(8187648K), 0.0063030 secs] [Times: user=0.02 sys=0.01, real=0.00 secs] [GC [PSYoungGen: 2723744K->1584K(2726784K)] 2747051K->25614K(8188160K), 0.0063850 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2724400K->1488K(2726656K)] 2748430K->26243K(8188032K), 0.0061930 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2724304K->1584K(2727104K)] 2749059K->27002K(8188480K), 0.0058790 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2725040K->864K(2726976K)] 2750458K->27006K(8188352K), 0.0055540 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2724320K->864K(2727168K)] 2750462K->27670K(8188544K), 0.0054860 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] [GC [PSYoungGen: 2724512K->736K(2727104K)] 2751318K->28222K(8188480K), 0.0050990 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2724384K->768K(2727232K)] 2751870K->28806K(8188608K), 0.0046180 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2724608K->1227K(2727232K)] 2752646K->29802K(8188608K), 0.0039870 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2725067K->736K(2727488K)] 2753642K->30306K(8188864K), 0.0068510 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2725024K->736K(2727424K)] 2754594K->30842K(8188800K), 0.0046310 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] [GC [PSYoungGen: 2725024K->704K(2727616K)] 2755130K->31330K(8188992K), 0.0078330 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2725248K->1232K(2727552K)] 2755874K->32378K(8188928K), 0.0067230 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2725776K->1440K(2727872K)] 2756922K->33166K(8189248K), 0.0076770 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2726496K->1568K(2727808K)] 2758222K->33878K(8189184K), 0.0091210 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC [PSYoungGen: 2726624K->896K(2728000K)] 2758934K->33902K(8189376K), 0.0059810 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] [GC [PSYoungGen: 2726208K->1552K(2727936K)] 2759214K->35231K(8189312K), 0.0087600 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] [GC [PSYoungGen: 2726864K->1328K(2728128K)] 2760543K->35703K(8189504K), 0.0059070 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] [GC [PSYoungGen: 2726832K->2017K(2727552K)] 2761207K->37007K(8188928K), 0.0049500 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] [GC [PSYoungGen: 2727521K->1312K(2728064K)] 2762511K->37418K(8189440K), 0.0077900 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] main 2012-04-11 12:42:28,103 INFO [eq.rds2.cache.InfinispanTest] Finished loading cache
- However did not notice any improvement in time. (10 mins).
- My usecase is simple. I have a list of 50k objects and I am loading all these in cache at the application startup. (a bulk load kind of thing).
- I will be happy to try out the maxindexer feature that you are planning to release.
-
4. Re: Infinispan - Any ideas to make indexing faster.
galder.zamarreno Apr 11, 2012 12:31 PM (in response to mkg) -
5. Re: Infinispan - Any ideas to make indexing faster.
sannegrinovero Apr 11, 2012 6:03 PM (in response to mkg)Did you try any of the tuning options?
-
6. Re: Infinispan - Any ideas to make indexing faster.
sarav.ks May 1, 2012 2:44 PM (in response to sannegrinovero)Hi,
I am also having the same problem with loading the data with Index enabled . Today i downloaded 5.1.4 , with the below config
<property name="hibernate.search.default.directory_provider" value="filesystem" /> <property name="hibernate.search.default.indexBase" value="C:\\Temp" /> <property name="hibernate.search.default.exclusive_index_use" value="true" /> <property name="hibernate.search.default.indexwriter.use_compound_file" value="false" /> <property name="hibernate.search.default.indexwriter.ram_buffer_size" value="500" /> It took 19 min to load 5000 records , my object is not very big , it has 20 fields (out of which around 10 are objects and other primitives)
@Indexed
@ProvidedId
public class Inventory implements Serializable {
@Field
@NumericField
private long assetId;
....
Because the Time taken is way high , i think i am missing some configuration. I would appriciate if you can validate my setup . Thanks
Also, i thought the index files once created will be re-used when the system is re-started , but it turns out that every time i re-start my system , it takes 19 - 20 min to load 5k records.
-
7. Re: Infinispan - Any ideas to make indexing faster.
mkg May 2, 2012 12:55 AM (in response to sannegrinovero)Hi Sanne
Were you able to release the multithreaded massindexer feature that you mentioned. Please let me know the version if it is in a usable state.
Infinispan is great for our usecase with the only bottleneck being the indexing time because it will increase the startup time of our application.
-
8. Re: Infinispan - Any ideas to make indexing faster.
sannegrinovero May 3, 2012 7:23 AM (in response to mkg)Hi all,
some more tricks to speedup indexing:
- Use Near-Real-Time when possible http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#d0e843
- Try a reasonable number of shards: not too many, for example it's unlikely to be useful to have more shards than CPU cores http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#search-configuration-directory-sharding
- Look at other options, such as merge_factor http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#lucene-indexing-performance
- Storing the index on filesystem will always be slower than storing it in Infinispan by configuring an Infinispan Directory (although some CacheStores in Infinispan might be slower than direct disk storage).
MKG kumar wrote:
Were you able to release the multithreaded massindexer feature that you mentioned. Please let me know the version if it is in a usable state.
Infinispan is great for our usecase with the only bottleneck being the indexing time because it will increase the startup time of our application.
No sorry, that wasn't done yet. thanks for the reminder!
-
9. Re: Infinispan - Any ideas to make indexing faster.
sarav.ks Aug 8, 2012 5:43 PM (in response to sannegrinovero)Is there any update on the multithreaded massindexer feature ? We are not able to use the cache because we are not able to index our data at the startup in a reasonable amount of time.
-
10. Re: Infinispan - Any ideas to make indexing faster.
mkg Aug 9, 2012 12:37 AM (in response to sarav.ks)because of the slow indexing only, we are also not able to use the Infinispan. We have millions of objects to index. With 50k objects only, it takes 10 mins to index.