5 Replies Latest reply on Nov 18, 2015 10:15 AM by asafz

    Lucene over Infinispan index corruption

    asafz

      We are using lucene over inifispan with 2 active nodes, infinispan is persistent to files.

      Sometimes restarting one of the nodes causes to a lucene index corruption. we see this exception in the logs when trying to read from the index:

       

      1. java.io.FileNotFoundException: Error loading metadata for index file: _78.si|M|skywareAccountsIndex

              at org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:134)

              at org.infinispan.lucene.impl.DirectoryLuceneV4.openInput(DirectoryLuceneV4.java:101)

              at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)

              at org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49)

              at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:361)

              at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:57)

              at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:907)

              at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:53)        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:67)

       

       

      at

       

      the infinspan configuration is:

      <replicated-cache name="ACCOUNT_LUCENE_METADATA_CACHE" mode="SYNC">

                              <expiration interval="-1" />

                              <locking striping="false" />

                              <persistence passivation="false">

                                      <file-store shared="false" preload="true" fetch-state="true" purge="false"

                                              path="/var/skyfence/management/management_dbs/cache" />

                              </persistence>

                      </replicated-cache>

       

       

                      <replicated-cache name="ACCOUNT_LUCENE_DATA_CACHE" mode="SYNC">

                              <expiration interval="-1" />

                              <locking striping="false" />

                              <persistence passivation="false">

                                      <file-store shared="false" preload="false" fetch-state="true" purge="false"

                                              path="/var/skyfence/management/management_dbs/cache" />

                              </persistence>

                      </replicated-cache>

       

      I understand from the lucene documentation that lucene over file-system should be fault tolerant for crashes, is that also the case for lucene over infinspan?

      Any idea what are we doing wrong?

        • 1. Re: Lucene over Infinispan index corruption
          gustavonalle

          Hi, what kind of architecture does your system have? Is data stored in Infinispan or in a relational database?  Can you post the full indexing configuration and also which Infinispan version you are using?

          • 2. Re: Lucene over Infinispan index corruption
            asafz

            Hi,

            We are using infinispan 7.2.1.Final.

             

            We have 2 servers (running over tomcat) with lucene over infinspan. the infinispan persistence configuration is to files or database (depends on the index)

             

            Here is the configuration:

            <?xml version="1.0" encoding="UTF-8"?>

            <infinispan>

             

             

              <jgroups>

              <stack-file name="tcpStack" path="infinispan-tcp-discovery.xml" />

              </jgroups>

             

             

              <cache-container default-cache="CM_SERVICE_TYPES">

              <jmx duplicate-domains="true" />

              <transport stack="tcpStack" cluster="sampleCluster" />

            <replicated-cache name="SKYWARE_SERVICE_LUCENE_METADATA_CACHE"

              mode="SYNC">

              <expiration interval="-1" />

              <locking striping="false" />

              <persistence passivation="false">

              <file-store shared="false" preload="true" fetch-state="true" purge="false"

              path="/var/skyfence/management/management_dbs/cache" />

              </persistence>

              </replicated-cache>

             

             

              <replicated-cache name="SKYWARE_SERVICE_LUCENE_DATA_CACHE"

              mode="SYNC">

              <expiration interval="-1" />

              <locking striping="false" />

              <persistence passivation="false">

              <file-store shared="false" preload="false" fetch-state="true" purge="false"

              path="/var/skyfence/management/management_dbs/cache" />

              </persistence>

              </replicated-cache>

             

             

              <replicated-cache name="SKYWARE_SERVICE_LUCENE_LOCKING_CACHE"

              mode="SYNC" start="EAGER" />

             

            </cache-container>

             

             

            </infinispan>

            • 3. Re: Lucene over Infinispan index corruption
              gustavonalle

              Thanks, could you also provide info on how do you coordinate the writing to the Lucene index among the two active nodes?

              • 4. Re: Lucene over Infinispan index corruption
                asafz

                Thans, sure

                we implemented a distributed lock over postgres.

                when ever an index if opened for writing we lock it across the cluster so it is not possible to write in the same time from two different nodes. read operations are not locked

                • 5. Re: Lucene over Infinispan index corruption
                  asafz

                  After further investigation we found out that when using ASYNC infinispan configuration lucene index is more likely to get corrupted even tough we are writting to the index from only one node:

                  <replicated-cache name="ACCOUNT_LUCENE_METADATA_CACHE"

                                  mode="ASYNC">

                                  <expiration interval="-1" />

                                  <locking striping="false" />

                                  <persistence passivation="false">

                                                  <file-store shared="false" preload="true" fetch-state="true" purge="false"

                                                                  path="/var/skyfence/management/management_dbs/cache" />

                                  </persistence>

                  </replicated-cache>

                   

                  <replicated-cache name="ACCOUNT_LUCENE_DATA_CACHE"

                                  mode="ASYNC">

                                  <expiration interval="-1" />

                                  <locking striping="false" />

                                  <persistence passivation="false">

                                                  <file-store shared="false" preload="false" fetch-state="true" purge="false"

                                                                  path="/var/skyfence/management/management_dbs/cache" />

                                  </persistence>

                  </replicated-cache>

                   

                  <replicated-cache name="ACCOUNT_LUCENE_LOCKING_CACHE"

                                  mode="SYNC" start="EAGER" />

                  We were not able to reproduce the index corruption when using SYNC mode but we are still trying.

                  Is there an explanation why this can happen?