Unable to join hotrod cluster across regions
mackerman Mar 14, 2012 12:30 PMI am finding it impossible to establish a hotrod cluster between certain nodes in our cluster. We have 2 nodes in N America, for which we have no problems establishing a hotrod cluster. However, when we try to add a cluster member from either Asia Pacific or Europe, after a short while we are getting TimeoutExceptions, which then results in the new cluster member(s) being dropped. I have also synchronized the clocks on the nodes, they all use UTC.
Does anyone have any suggestions as to how to get this cluster to work?
We see the following errors in the logs
Coordinator node (US):
remote node joins:
2012-03-14 16:04:14,581 DEBUG (OOB-2,hotrod-dev,ip-10-81-0-227-56141) [org.infinispan.cacheviews.CacheViewsManagerImpl] ___hotRodTopologyCache: Node ip-10-81-208-97-19453 is joining
2012-03-14 16:04:14,584 DEBUG (CacheViewInstaller-2,ip-10-81-0-227-56141) [org.infinispan.cacheviews.CacheViewsManagerImpl] Installing new view CacheView{viewId=2, members=[ip-10-81-0-227-56141, ip-10-81-208-97-19453]} for cache ___hotRodTopologyCache
300000 millisecs later (the distributed timeout value):
2012-03-14 16:09:39,084 ERROR (CacheViewInstaller-2,ip-10-81-0-227-56141) [org.infinispan.cacheviews.CacheViewsManagerImpl] ISPN000172: Failed to prepare view CacheView{viewId=2, members=[ip-10-81-0-227-56141, ip-10-81-208-97-19453]} for cache P, rolling back to view CacheView{viewId=1, members=[ip-10-81-0-227-56141]}
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:322)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:250)
at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:876)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2012-03-14 16:10:24,094 DEBUG (Timer-3,hotrod-dev,ip-10-81-0-227-56141) [org.jgroups.protocols.FD] sending are-you-alive msg to ip-10-81-208-97-19453 (own address=ip-10-81-0-227-56141)
2012-03-14 16:10:38,834 ERROR (CacheViewInstaller-3,ip-10-81-0-227-56141) [org.infinispan.cacheviews.CacheViewsManagerImpl] ISPN000172: Failed to prepare view CacheView{viewId=2, members=[ip-10-81-0-227-56141, ip-10-81-208-97-19453]} for cache R, rolling back to view CacheView{viewId=1, members=[ip-10-81-0-227-56141]}
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:319)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:250)
at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:876)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Asia node:
org.infinispan.CacheException: Unable to invoke method private void org.infinispan.statetransfer.BaseStateTransferManagerImpl.start() throws java.lang.Exception on object
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:236)
at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:875)
at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:630)
at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:619)
at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:523)
at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:173)
at org.infinispan.CacheImpl.start(CacheImpl.java:496)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:624)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:514)
at org.infinispan.server.hotrod.HotRodServer$$anonfun$preStartCaches$1.apply(HotRodServer.scala:114)
at org.infinispan.server.hotrod.HotRodServer$$anonfun$preStartCaches$1.apply(HotRodServer.scala:112)
at scala.collection.Iterator$class.foreach(Iterator.scala:660)
at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573)
at org.infinispan.server.hotrod.HotRodServer.preStartCaches(HotRodServer.scala:112)
at org.infinispan.server.hotrod.HotRodServer.startTransport(HotRodServer.scala:101)
at org.infinispan.server.core.AbstractProtocolServer.start(AbstractProtocolServer.scala:100)
at org.infinispan.server.hotrod.HotRodServer.start(HotRodServer.scala:95)
at org.infinispan.server.core.Main$.boot(Main.scala:140)
at org.infinispan.server.core.Main$$anon$1.call(Main.scala:94)
at org.infinispan.server.core.Main$$anon$1.call(Main.scala:91)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:234)
... 26 more
Caused by: org.infinispan.util.concurrent.TimeoutException: Timed out after 5 minutes waiting for a response from ip-10-81-0-227-56141
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:271)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:111)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:447)
at org.infinispan.cacheviews.CacheViewsManagerImpl.join(CacheViewsManagerImpl.java:214)
at org.infinispan.statetransfer.BaseStateTransferManagerImpl.start(BaseStateTransferManagerImpl.java:139)
... 31 more
I have increased timeout values, but this does not seem to have any effect.
Our configuration files are as follows:
hotrod-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
xmlns="urn:infinispan:config:5.1">
<global>
<transport clusterName="hotrod-dev" distributedSyncTimeout="300000">
<properties>
<property name="configurationFile" value="/opt/infinispan-5.1.1.FINAL/etc/gossip-router-config.xml"/>
</properties>
</transport>
<globalJmxStatistics enabled="true"/>
</global>
<default>
<jmxStatistics enabled="true"/>
<clustering mode="R">
<stateRetrieval timeout="300000"/>
</clustering>
</default>
<namedCache name="A"/>
<namedCache name="B"/>
<namedCache name="P"/>
<namedCache name="M"/>
</infinispan>
gossip-router-config.xml (i removed FD_SOCK to see if that helped, but same behaviour either way)
<?xml version="1.0" encoding="UTF-8"?>
<config>
<TCP bind_port="7900"/>
<TCPGOSSIP timeout="3000" initial_hosts="10.81.0.227[8800]" num_initial_members="3"/>
<MERGE2 max_interval="30000" min_interval="10000"/>
<!-- <FD_SOCK/> -->
<FD timeout="50000" max_tries="5"/>
<VERIFY_SUSPECT timeout="5000"/>
<pbcast.NAKACK use_mcast_xmit="false" retransmit_timeout="300,600,1200,2400,4800" discard_delivered_msgs="true"/>
<UNICAST timeout="300,600,1200,2400,3600"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="400000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true"/>
<UFC max_credits="2000000" min_threshold="0.10"/>
<MFC max_credits="2000000" min_threshold="0.10"/>
<FRAG2 frag_size="60000"/>
</config>
startup command:
/opt/infinispan-5.1.1.FINAL/bin/startServer.sh -Djava.net.preferIPv4Stack=true -Djgroups.bind_addr=10.81.208.97 --cache_config=/opt/infinispan-5.1.1.FINAL/etc/hotrod-config.xml --protocol=hotrod --host=10.81.208.97 -Dlog4j.configuration=file:///opt/infinispan-5.1.1.FINAL/etc/hotrod-log4j.xml
Environments are:
US nodes (note that we have has no issues mixing Ubuntu & CentOS in the US)
Ubuntu, running
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
infinispan-5.1.1.FINAL
Asia & Europe
CentOS (5.7 & 6.2), running
java version "1.6.0_30"
Java(TM) SE Runtime Environment (build 1.6.0_30-b12)
Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)
infinispan-5.1.1.FINAL
thanks, Mitchell