-
1. lucene metadata value size
mircea.markus May 17, 2011 8:20 AM (in response to brackxm)For using a jdbc store, it is nice to know the size of the values.
Why that? for configuring the data column perhaps?
-
2. lucene metadata value size
brackxm May 19, 2011 7:20 AM (in response to mircea.markus)Yes, indeed.
A data point:
about 200 "files" (in 1 index) gives a maximum size of 3669 bytes
-
3. lucene metadata value size
sannegrinovero May 19, 2011 12:42 PM (in response to brackxm)Hi Michael,
good point, that is an important information for who uses the JDBC store.
To be honest I don't know the exact size in bytes; looking in the sourcecode of
org.infinispan.lucene.FileMetadata
you can see that the class is encoded as two longs and an int:
UnsignedNumeric.writeUnsignedLong(output, metadata.lastModified);
UnsignedNumeric.writeUnsignedLong(output, metadata.size);
UnsignedNumeric.writeUnsignedInt(output, metadata.bufferSize);
, in addition to that it should take at least another int to identify the type, but then again I'm not sure what kind of overhead the different cacheloaders might add. In the case of the jdbc store, that should be it, but as I'm not sure I just opened ISPN-1125
In addition, make sure you don't limit the size to the needs of FileMetadata only: by far the largest object being stored in the metadata cache is the HashSet stored under the key FileListCacheKey, this will need to contain a list of all current filenames composing the index, so it's maximum size is revealed at maximum index fragmentation. I guess we will need further chunking, but I'd do this at the JDBC cache store level, as it's not a Lucene specific problem, but is related to everyone using a database.
-
4. Re: lucene metadata value size
brackxm May 19, 2011 1:03 PM (in response to sannegrinovero)A FileMetadata is 122 bytes.
My estimate for the file list would be 756 + 15 * (number of lucene files).
The filenames lucene uses do become longer however.
-
5. lucene metadata value size
sannegrinovero May 19, 2011 1:56 PM (in response to brackxm)actually it seems that because of how we pack longs and ints a FileMetadata is 19bytes; but ok better to define a bit larger than what is strictly used in a small unit test.
The issue was fixed, you could either build it from source or wait for the CR3 release of tomorrow: you'll need to enable trace level logging
on org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore, it will show you the exact size of bytes being stored for each insert operation.
I'll propose to add a chunking capability to the JDBC cacheloader.
-
6. lucene metadata value size
sannegrinovero May 19, 2011 2:19 PM (in response to sannegrinovero)proposal:
http://infinispan.markmail.org/search/#query:+page:1+mid:xtubddwd6iye2vny+state:results
Feel free to join the mailing list if you want to discuss this.
-
7. lucene metadata value size
brackxm May 20, 2011 12:50 PM (in response to sannegrinovero)actually it seems that because of how we pack longs and ints a FileMetadata is 19bytes; but ok better to define a bit larger than what is strictly used in a small unit test.
In my db they are 122 bytes. So guess there is something wrong with the serialization.
Any hints on how to check that?
-
8. Re: lucene metadata value size
sannegrinovero May 24, 2011 9:19 AM (in response to brackxm)Hi Michael,
could you enable the trace log and see if there's a mismatch with the database bytes?
This commit made it the other day just before the CR3 release:
so you could enable tracing for this class to have effective buffer sizes logged.
-
9. Re: lucene metadata value size
brackxm May 25, 2011 10:38 AM (in response to sannegrinovero)On 5.0.0.CR3 they are indeed 22 bytes.
-
10. Re: lucene metadata value size
sannegrinovero May 25, 2011 4:53 PM (in response to brackxm)right, in versions before 5 it was not using the custom externalizer so the byte representation was not the same.