Infinispan cache UDP
saranya.guna Jul 20, 2015 6:32 AMHi All,
Faced the below issue in production enviroment.
When the application tries to acquire lock for the cache , we got replication timeout exception.
We have four nodes in clustered setup( node1, node2, node3 and node 4) .Node 3 went down and again came up within two minutes. When the node went down, the below message was sent only to the coordinator node i.e) node 1.
16:56:24,584 DEBUG [org.jgroups.protocols.pbcast.NAKACK2] (Incoming-6,shared=udp) removed node3:server/test from xmit_table (not member anymore)
The above message was not received in node 2 and node 4. Got the below error in coordinator node.
16:56:29,584 WARN [org.jgroups.protocols.pbcast.GMS] (ViewHandler,test,node1:server/test) node1:server/test: failed to collect all ACKs (expected=3) for view [node1:server/test|4] after 5000ms, missing ACKs from [node2:server/test, node4:server/test]
So, when the node comes up the below message was received only by the coordinator.
16:57:57,642 DEBUG [org.jgroups.protocols.pbcast.GMS] (Incoming-12,shared=udp) node1:server/test installing view [node1:server/test|5] [node1:server/test,node2:server/test, node4:server/test, node3:server/test]
16:57:57,642 DEBUG [org.jgroups.protocols.FD_SOCK] (Incoming-12,shared=udp) VIEW_CHANGE received: [node1:server/test, node2:server/test, node4:server/test, node3:server/test]
While checking the logs in node2, i could see FD has detected that the node 3 went down through heart beat message
16:56:59,764 DEBUG [org.jgroups.protocols.FD] (Timer-3,shared=udp) node2:server/test: received no heartbeat from node3:server/test for 5 times (30000 milliseconds), suspecting it
Below are the cache configurations:
<cache-container name="test" aliases="test" default-cache="test">
<transport lock-timeout="60000"/>
<replicated-cache name="test" mode="SYNC" start="EAGER" batching="true">
<transaction mode="NON_XA" locking="PESSIMISTIC"/>
<locking isolation="READ_COMMITTED" striping="false" acquire-timeout="600000"/>
</replicated-cache>
</cache-container>
<subsystem xmlns="urn:jboss:domain:jgroups:1.1" default-stack="udp">
<stack name="udp">
<transport type="UDP" socket-binding="jgroups-udp"/>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
<protocol type="FD"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK"/>
<protocol type="UNICAST2"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG2"/>
<protocol type="RSVP"/>
</stack>
</subsytem>