Lock/Replication timeouts on 3 node EC2
joe.planisky Aug 9, 2011 6:11 PMI'm trying to run a replicated cache on 3 nodes (and eventually more) using Infinispan 5.0.0.FINAL in the Amazon EC2 cloud and I'm running into intermittent TimeoutExceptions. Most of the time, I get an "Unable to acquire lock after [10 seconds]...", but sometimes it's a "Replication timeout..." (see attached log file excerpt for details.)
I've narrowed things down to a simple demo program (see attached file), the essence of which is this:
EmbeddedCacheManager mgr = new DefaultCacheManager("testconfig.xml"); Cache < String, String > cache = mgr.getCache("TestCache"); try { cache.put("c", "start"); } catch (Exception x ) {} while (true) { System.out.println("*************"); System.out.println("Before update: " + cache.get("c")); String d = myIp + " " + new Date().toString(); try { cache.put("c", d); } catch (Exception x) {} System.out.println(" After update: " + cache.get("c")); System.out.println("*************"); Thread.sleep(1000); }
I start this program on the 1st node and wait until it's up and running. Then I start the 2nd node. When the 2nd node is starting, both the 1st and 2nd nodes seem to freeze for about 15 seconds, but eventually they resume running and I see the expected console outputs on both. When I start the 3rd node, again I see all nodes freeze for about 15 seconds, but usually everything recovers and I see the expected outputs on all 3 nodes.
However, after a variable amount of time (a few seconds to a minute or more), I will see the nodes freeze again and after about 10 seconds I'll see the TimeoutExceptions on 2 of the nodes and the 3rd one will just continue where it paused.
In my jGroups configuration, I'm using TCP transport and S3_PING membership discovery. (I've also used the FILE_PING discovery with the same results, so I don't think it's an S3 issue.).
The significant portion of my Infinispan configuration is:
<namedCache name="TestCache"> <deadlockDetection enabled="true"/> <unsafe unreliableReturnValues="false" /> <locking concurrencyLevel="1000" useLockStriping="false" lockAcquisitionTimeout="10000" /> <clustering mode="replication"> <sync /> <stateRetrieval fetchInMemoryState="true"/> </clustering> </namedCache>
I've attached my complete Infinispan and jGroups configuration files.
I'm using:
- Infinispan 5.0.0.FINAL
- Ubuntu 10.04 (kernel 2.6.32-308-ec2)
- Java 1.6.0_20
Do I have a configuration problem? Am I not using Infinispan correctly? Any hints on how to fix or work around this issue?
--
Joe
-
log.txt.zip 1.6 KB
-
testconfig.xml 910 bytes
-
testjgroups.xml 2.1 KB
-
InfinispanTest.java.zip 635 bytes