1 2 Previous Next 16 Replies Latest reply: Nov 10, 2011 11:03 AM by Sanne Grinovero RSS

Replication Timeout with Hibernate Search - Infinispan

pbratton Newbie

I'm working with a 10-node Infinispan cluster used as a Hibernate Search backend.  Our servers are running TC server 2.5 (tomcat 6.0.32) on Java 1.6_24.  We are using jGroups 2.12.1.3 for handling cluster cache writes from each node, and for multicast UDP transport. 

 

When we launch 3+ nodes in our cluster, eventually one of the nodes begins to log replication timeouts.  We've observed the same result whether we configure Infinispan for replication or for distribution cache modes.  Although the rest of the cluster remains stable, the failing node becomes essentially unsuable for search. 

 

Our configuration:

 

Infinispan:

<?xml version="1.0" encoding="UTF-8"?>

<infinispan

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

    xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"

    xmlns="urn:infinispan:config:5.0">

    <global>

        <globalJmxStatistics

            enabled="true"

            cacheManagerName="HibernateSearch"

            allowDuplicateDomains="true" />

        <transport

            clusterName="HibernateSearch-Infinispan-cluster-MT"

            distributedSyncTimeout="50000">

            <properties>

                <property name="configurationFile" value="infinispan-udp.cfg.xml" />

            </properties>

        </transport>

        <shutdown

            hookBehavior="DONT_REGISTER" />

    </global>

 

 

    <default>

        <locking

            lockAcquisitionTimeout="20000"

            writeSkewCheck="false"

            concurrencyLevel="5000"

            useLockStriping="false" />

        <storeAsBinary storeKeysAsBinary="false" storeValuesAsBinary="true"

            enabled="false" />

        <invocationBatching

            enabled="true" />

        <clustering

            mode="replication">

            <stateRetrieval

                timeout="60000"

                logFlushTimeout="65000"

                fetchInMemoryState="true"

                alwaysProvideInMemoryState="true" />

            <sync

                replTimeout="50000" />

            <l1 enabled="false" />

        </clustering>

        <jmxStatistics

            enabled="true" />

        <eviction

            maxEntries="-1"

            strategy="NONE" />

        <expiration

            maxIdle="-1" />

    </default>

 

 

    <namedCache

        name="LuceneIndexesMetadata">

        <clustering

            mode="replication">

            <stateRetrieval

                fetchInMemoryState="true"

                logFlushTimeout="30000" />

            <sync

                replTimeout="50000" />

            <l1 enabled="false" />

        </clustering>

        <locking

            lockAcquisitionTimeout="20000"

            writeSkewCheck="false"

            concurrencyLevel="5000"

            useLockStriping="false" />

        <loaders shared="true" preload="true">

            <loader class="org.infinispan.loaders.jdbm.JdbmCacheStore" fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false">

                <properties>

                    <property name="location" value="/usr/local/tc/.index/metadata" />

                </properties>

            </loader>

        </loaders>

    </namedCache>

    <namedCache

        name="LuceneIndexesData">

        <clustering

            mode="replication">

            <stateRetrieval

                fetchInMemoryState="true"

                logFlushTimeout="30000" />

            <sync

                replTimeout="50000" />

            <l1 enabled="false" />

        </clustering>

        <locking

            lockAcquisitionTimeout="20000"

            writeSkewCheck="false"

            concurrencyLevel="5000"

            useLockStriping="false" />

        <loaders shared="true" preload="true">

            <loader class="org.infinispan.loaders.jdbm.JdbmCacheStore" fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false">

                <properties>

                    <property name="location" value="/usr/local/tc/.index/data" />

                </properties>

            </loader>

        </loaders>

    </namedCache>

    <namedCache

        name="LuceneIndexesLocking">

        <clustering

            mode="replication">

            <stateRetrieval

                fetchInMemoryState="true"

                logFlushTimeout="30000" />

            <sync

                replTimeout="50000" />

            <l1 enabled="false" />

        </clustering>

        <locking

            lockAcquisitionTimeout="20000"

            writeSkewCheck="false"

            concurrencyLevel="5000"

            useLockStriping="false" />

    </namedCache>

</infinispan>


jGroups (UDP):

<config xmlns="urn:org:jgroups"

        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-2.12.xsd">

    <UDP

         mcast_addr="${jgroups.udp.mcast_addr:228.10.10.9}"

         mcast_port="${jgroups.udp.mcast_port:45599}"

         tos="8"

         ucast_recv_buf_size="20000000"

         ucast_send_buf_size="640000"

         mcast_recv_buf_size="25000000"

         mcast_send_buf_size="640000"

         loopback="true"

         discard_incompatible_packets="true"

         max_bundle_size="64000"

         max_bundle_timeout="30"

         ip_ttl="${jgroups.udp.ip_ttl:2}"

         enable_bundling="true"

         enable_diagnostics="false"

         thread_naming_pattern="pl"

         thread_pool.enabled="true"

         thread_pool.min_threads="2"

         thread_pool.max_threads="30"

         thread_pool.keep_alive_time="5000"

         thread_pool.queue_enabled="false"

         thread_pool.queue_max_size="100"

         thread_pool.rejection_policy="Discard"

         oob_thread_pool.enabled="true"

         oob_thread_pool.min_threads="2"

         oob_thread_pool.max_threads="30"

         oob_thread_pool.keep_alive_time="5000"

         oob_thread_pool.queue_enabled="false"

         oob_thread_pool.queue_max_size="100"

         oob_thread_pool.rejection_policy="Discard"

         />

 

   <PING timeout="3000" num_initial_members="10"/>

   <MERGE2 max_interval="30000" min_interval="10000"/>

   <FD_SOCK/>

   <FD/>

   <BARRIER />

   <pbcast.NAKACK use_stats_for_retransmission="false"

                   exponential_backoff="0"

                   use_mcast_xmit="true" gc_lag="0"

                   retransmit_timeout="300,600,1200"

                   discard_delivered_msgs="true"/>

   <UNICAST timeout="300,600,1200"/>

   <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="1000000"/>

   <pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true"/>

   <UFC max_credits="500000" min_threshold="0.20"/>

   <MFC max_credits="500000" min_threshold="0.20"/>

   <FRAG2 frag_size="60000"  />

   <pbcast.STREAMING_STATE_TRANSFER/>        

</config>

 

And the errors we observe:

10-31-2011 13:53:02 ERROR Hibernate Search: Directory writer-3 interceptors.InvocationContextInterceptor: ISPN000136: Execution error

org.infinispan.util.concurrent.TimeoutException: Replication timeout for tc-cluster-0105-21082

          at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:71)

          at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:452)

          at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:132)

          at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:156)

          at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:265)

          at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:252)

          at org.infinispan.remoting.rpc.RpcManagerImpl.broadcastRpcCommand(RpcManagerImpl.java:235)

          at org.infinispan.remoting.rpc.RpcManagerImpl.broadcastRpcCommand(RpcManagerImpl.java:228)

          at org.infinispan.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:116)

          at org.infinispan.interceptors.ReplicationInterceptor.visitPutKeyValueCommand(ReplicationInterceptor.java:79)

          at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)

          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:119)

          at org.infinispan.interceptors.LockingInterceptor.visitPutKeyValueCommand(LockingInterceptor.java:294)

          at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)

          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:119)

          at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:133)

          at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:60)

          at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)

          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:119)

          at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:214)

          at org.infinispan.interceptors.TxInterceptor.visitPutKeyValueCommand(TxInterceptor.java:162)

          at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)

          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:119)

          at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:114)

          at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)

          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:119)

          at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:104)

          at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:64)

          at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:60)

          at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)

          at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:119)

          at org.infinispan.interceptors.BatchingInterceptor.handleDefault(BatchingInterceptor.java:77)

          at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:60)

          at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)

          at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:274)

          at org.infinispan.CacheImpl.putIfAbsent(CacheImpl.java:524)

          at org.infinispan.CacheSupport.putIfAbsent(CacheSupport.java:74)

          at org.infinispan.lucene.locking.BaseLuceneLock.obtain(BaseLuceneLock.java:65)

          at org.apache.lucene.store.Lock.obtain(Lock.java:72)

          at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1097)

          at org.hibernate.search.backend.Workspace.createNewIndexWriter(Workspace.java:202)

          at org.hibernate.search.backend.Workspace.getIndexWriter(Workspace.java:180)

          at org.hibernate.search.backend.impl.lucene.PerDPQueueProcessor.run(PerDPQueueProcessor.java:103)

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

          at java.lang.Thread.run(Thread.java:662)

 

Because this error is so pervasive regardless of our topology or caching mode, we believe we must be misconfigured somewhere.  Can anyone recommend a fix?

  • 1. Re: Replication Timeout with Hibernate Search - Infinispan
    Sanne Grinovero Master

    Hi,

    do you have an estimate of the index size, and what is your Hibernate Search configuration? I'm especially interested in the indexing tuning: you should try to make sure the index segments are not too big; the timeouts you have configured look like quite generous, but still they need to be high enough to enable your network to replicate changes to other nodes.

     

    Make sure JGroups can take full davantage of your network; using Linux? JGroups will log some suggested settings for the network configuration when it starts.

     

    It would be useful if you could monitor your system to identify if you are waiting on some lock, or just being non-efficient on network level with a big index; please try it as well disabling the CacheLoader as it might slow down the other systems.

  • 2. Re: Replication Timeout with Hibernate Search - Infinispan
    pbratton Newbie

    Hi Sanne,

     

    I can't tell you how much I appreciate the fast response.  Here is the relevant portion of our hibernate.cfg.xml:

     

    <?xml version="1.0" encoding="UTF-8"?>

    <!DOCTYPE hibernate-configuration PUBLIC

    "-//Hibernate/Hibernate Configuration DTD 3.0//EN"

    "http://www.hibernate.org/dtd/hibernate-configuration-3.0.dtd">

    <hibernate-configuration>

        <session-factory>

            <property name="hibernate.search.default.directory_provider">infinispan</property>

            <property name="hibernate.search.infinispan.configuration_resourcename">infinispan.cfg.xml</property>

            <property name="hibernate.search.infinispan.chunk_size">4096</property>

            <property name="hibernate.search.worker.execution">async</property>

            <property name="hibernate.search.worker.thread_pool.size">10</property>

     

     

            <property name="hibernate.search.worker.backend">jgroupsMaster</property>

            <property name="hibernate.search.worker.backend.jgroups.configurationFile">udp.cfg.xml</property>

            <property name="hibernate.search.worker.backend.jgroups.clusterName">Hibernate-Search-Cluster-MT</property>

        </session-factory>

    </hibernate-configuration>

     

    And the jGroups configuration for the Hibernate Search cluster:

    <config xmlns="urn:org:jgroups"

            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

            xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-2.12.xsd">

        <UDP

             mcast_addr="${jgroups.udp.mcast_addr:228.10.10.10}"

             mcast_port="${jgroups.udp.mcast_port:45588}"

             tos="8"

             ucast_recv_buf_size="20000000"

             ucast_send_buf_size="640000"

             mcast_recv_buf_size="25000000"

             mcast_send_buf_size="640000"

             loopback="true"

             discard_incompatible_packets="true"

             max_bundle_size="64000"

             max_bundle_timeout="30"

             ip_ttl="${jgroups.udp.ip_ttl:2}"

             enable_bundling="true"

             enable_diagnostics="true"

     

             thread_naming_pattern="pl"

     

     

             thread_pool.enabled="true"

             thread_pool.min_threads="2"

             thread_pool.max_threads="8"

             thread_pool.keep_alive_time="5000"

             thread_pool.queue_enabled="false"

             thread_pool.queue_max_size="100"

             thread_pool.rejection_policy="Run"

     

     

             oob_thread_pool.enabled="true"

             oob_thread_pool.min_threads="1"

             oob_thread_pool.max_threads="8"

             oob_thread_pool.keep_alive_time="5000"

             oob_thread_pool.queue_enabled="false"

             oob_thread_pool.queue_max_size="100"

             oob_thread_pool.rejection_policy="Run"/>

     

     

        <PING timeout="1000" num_initial_members="3"/>

        <MERGE2 max_interval="30000" min_interval="10000"/>

        <FD_SOCK/>

        <FD/>

        <VERIFY_SUSPECT timeout="1500"/>

        <pbcast.NAKACK use_stats_for_retransmission="false"

                       exponential_backoff="150"

                       use_mcast_xmit="true" gc_lag="0"

                       retransmit_timeout="300,600,1200"

                       discard_delivered_msgs="false"/>

        <UNICAST timeout="300,600,1200"/>

        <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

                       max_bytes="4m"/>  

        <pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true"/>

        <UFC max_credits="2M"

             min_threshold="0.4"/>

        <MFC max_credits="2M"

             min_threshold="0.4"/>

        <FRAG2 frag_size="60000"/>

        <pbcast.STREAMING_STATE_TRANSFER />

        <!-- <pbcast.STATE_TRANSFER/> -->

        <pbcast.FLUSH timeout="0"/>

    </config>

     

    I don't have a precise estimate of the index size; however since we are starting a new system, the amount of data to be indexed is trivial at startup.  As for segment size, can you recommend a good way to find out?  We recently switched from a file-system based index (moving from that to a cluster), and the index total size was quite small.

     

    We are indeed on Linux (Red Hat EL 6.1).  We made some OS level changes based on the jGroups startup feedback, specifically setting the following:

    net.core.rmem_max=26214400

    net.core.wmem_max=640000

     

    in /etc/sysctl.conf.  One additional detail that may be of help is that the nodes are VMs in VMWare ESX, and there is a dedicated vswitch for the cluster traffic. 

     

    For monitoring, we've tried observing the jGroups and Infinispan logs for messages.  We don't immediately see anything that might indicate an issue.  Are there other places we can/should look?  Also, while disabling the CacheLoader may help, we would prefer to be able to bring the system up without a full re-index (in previous versions, indexing has taken 1+ hours with production data).  Can you recommend an alternative?

     

    Thanks again!

  • 3. Re: Replication Timeout with Hibernate Search - Infinispan
    Sanne Grinovero Master

    Any specific reason to have a different configuration than the one used by default in Infinispan? Infinispan includes a jgroups-udp.xml file in it's main jar, which contains the configuration we use in our tests. It's of course often needed to change the configuration, but it might be useful to understand why you made some changes, or if you are using an outdated recommended configuration from older Infinispan versions.

     

    The sysctl.conf settings look fine; just make sure by looking in /proc that they are applied; I don't know if VMWare affects network, I'm not familiar with it.

     

    The Search configuration looks almost ok; the chunk_size is very small. It uses a very small size by default as otherwise people get in trouble when not setting the sysctl.conf properties to accept larger packets, but I'd suggest for your case to start with around 10MB or slightly less: the bigger the chunks, the less index fragmentation; so the bigger chunks lead to better overall performance, until the packets get too big for your network+switches to handle efficiently.

     

    What are the exact versions of Hibernate Search and Infinispan ?

  • 4. Re: Replication Timeout with Hibernate Search - Infinispan
    pbratton Newbie

    We're using Hibernate Search 3.4.1.FINAL and Infinispan 5.0.0.FINAL.  We started with jgroups-udp.xml as a jumping-off point actually; I believe the only difference was that FD_ALL didn't work in our environment.  We also put the total size of the cluster into PING, which seemed to help with stability, although we saw the same error eventually (just took longer).  Here's the complete diff with the original:

     

    0a1,22

    > <!--

    >   ~ JBoss, Home of Professional Open Source

    >   ~ Copyright 2010 Red Hat Inc. and/or its affiliates and other

    >   ~ contributors as indicated by the @author tags. All rights reserved.

    >   ~ See the copyright.txt in the distribution for a full listing of

    >   ~ individual contributors.

    >   ~

    >   ~ This is free software; you can redistribute it and/or modify it

    >   ~ under the terms of the GNU Lesser General Public License as

    >   ~ published by the Free Software Foundation; either version 2.1 of

    >   ~ the License, or (at your option) any later version.

    >   ~

    >   ~ This software is distributed in the hope that it will be useful,

    >   ~ but WITHOUT ANY WARRANTY; without even the implied warranty of

    >   ~ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

    >   ~ Lesser General Public License for more details.

    >   ~

    >   ~ You should have received a copy of the GNU Lesser General Public

    >   ~ License along with this software; if not, write to the Free

    >   ~ Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA

    >   ~ 02110-1301 USA, or see the FSF site: http://www.fsf.org.

    >   -->

    3,6c25,28

    <         xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-2.12.xsd">

    <     <UDP

    <          mcast_addr="${jgroups.udp.mcast_addr:228.10.10.9}"

    <          mcast_port="${jgroups.udp.mcast_port:45599}"

    ---

    >         xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">

    >    <UDP

    >          mcast_addr="${jgroups.udp.mcast_addr:228.6.7.8}"

    >          mcast_port="${jgroups.udp.mcast_port:46655}"

    39c61

    <    <PING timeout="3000" num_initial_members="10"/>

    ---

    >    <PING timeout="3000" num_initial_members="3"/>

    42c64

    <    <FD/>

    ---

    >    <FD_ALL/>

    51c73

    <    <pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true"/>

    ---

    >    <pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>

    55c77

    <    <pbcast.STREAMING_STATE_TRANSFER/>        

    ---

    >    <pbcast.STREAMING_STATE_TRANSFER/>

     

    It's not clear from the docs (or the source javadocs for that matter) what units the chunk_size the parameter takes.  I presume the units are in bytes.  Upping the value to 10485760 had no effect.  We do think that the issue is in the cluster Infinispan relies on, not the jGroups cluster used for the master-slave Hibernate Search updates.  We'll keep that change in place; perhaps its an optimization we would have eventually needed anyway.

  • 5. Re: Replication Timeout with Hibernate Search - Infinispan
    pbratton Newbie

    One more thing... we verified that the /etc/sysctl.conf settings did get applied under /proc.

  • 6. Re: Replication Timeout with Hibernate Search - Infinispan
    pbratton Newbie

    Also, our Hibernate version is 3.6.4.Final.

  • 7. Re: Replication Timeout with Hibernate Search - Infinispan
    pbratton Newbie

    We downgraded Infinispan to 4.2.1.FINAL; that did the trick.  Interesting that later versions of Infinispan are not pluggable with Hibernate Search 3.4.1; unfortunately HS 4 requires Hibernate 4 (which we are awaiting Spring support for before upgrading).

     

    Thank you again, Sanne, for you assistance.

  • 8. Re: Replication Timeout with Hibernate Search - Infinispan
    Sanne Grinovero Master

    Hi, I'm glad you solved it, but not happy that there is such a compatibility problem with Infinispan 5.

     

    I know Hibernate Search 3.4.x depends on Infinispan 4.2.x, but even then there are only minor differences; If you happen to have more information about this that would be awesome as I'd need to make sure Hibernate Search 4.x works fine with it.

  • 9. Re: Replication Timeout with Hibernate Search - Infinispan
    pbratton Newbie

    It could perhaps have had something to do with the default configurations available in both jars.  The first attempt we made with Infinispan 4.2.1, we used the same jGroups configuration listed above.  We ran into more errors, so we tried the default configuration packaged with Infinispan 4.2.1, and the problem seems to have been resolved (for now).

     

    We believe the timeouts were due to a deadlock, but we're at a loss to figure out where the deadlock happened.

  • 10. Re: Replication Timeout with Hibernate Search - Infinispan
    Manik Surtani Master

    Can you confirm that even with pre-final versions of HS4 and Infinispan 5, you don't see this issue?

  • 11. Re: Replication Timeout with Hibernate Search - Infinispan
    pbratton Newbie

    I'm sorry, Hibernate Search 4 isn't an option for us because of the reasons outlined above.  Upgrading my code to Hibernate 4 isn't something I have the bandwidth for ATM.  Until Spring 3.1 is officially released, there are many users in our position that must rely on Hibernate Search 3.4.1.

  • 12. Re: Replication Timeout with Hibernate Search - Infinispan
    pbratton Newbie

    This may be helpful.  We had to patch jGroups 2.12.1.3.Final to address this error:

     

    Exception in thread "Hibernate Search: backend queueing processor-1" java.lang.NoSuchMethodError: org.jgroups.Message.<init>(Lorg/jgroups/Address;Lorg/jgroups/Address;Ljava/io/Serializable;)V

              at org.hibernate.search.backend.impl.jgroups.JGroupsBackendQueueProcessor.run(JGroupsBackendQueueProcessor.java:88)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

              at java.lang.Thread.run(Thread.java:662

     

    The constructor signature had been changed to Message(Address, Address, Object), and for some reason the Serializable passed as a message by Hibernate Search was not accepted.  I verified in the jGroups code that the code expects a Serializable (it will actually throw a RuntimeException if the Object is not castable to Serializable), so adding a constructor was trivial.

     

    What's bizarre here is that the error is non-deterministic.  Some of our writes go through, and some do not.

  • 13. Re: Replication Timeout with Hibernate Search - Infinispan
    pbratton Newbie

    We made some progress in recent days.  It seems that Hibernate Search is the culprit here, specifically the jgroupsSlave backend worker.  When configuring a jgroupsSlave with a filesystem backend, HS uses an older version of jGroups than the one distributed with Infinispan. (2.12.x).  It's the same problem these poor guys had.   The issue is that a constructor signature changed for org.jgroups.Message between 2.11.x and 2.12.x of jGroups.  So Infinispan would run fine, while HS would either a) error out if the worker was synchronous, or b) fail silenty if the worker was asynchronous.

     

    The following jars will work together nicely:

    Hibernate Search 3.4.1.Final

    Infinispan 4.2.1.Final

    jGroups 2.11.1.Final

     

    The good news is that 2.11.1.Final does not seem to adversely affect Infinispan in any way.  We're still testing, and will provide updates as we go.

  • 14. Re: Replication Timeout with Hibernate Search - Infinispan
    Sanne Grinovero Master

    right! thank you very much for reporting that back, and for notifying the other thread as well.

     

    In Hibernate Search 4.x since we don't need to support Java 5 anymore the JGroups version is uniquelly defined, so such issues won't happen again.

    I'll open an issue for Search, but not sure how to solve it other than with reflection - splitting it in more modules is going to break API, not something we want to fix an older version.

1 2 Previous Next