compacting garbage collector timeouts and cluster in jboss 5.1
arminhaaf Apr 24, 2012 10:00 AMwe have a jboss 5.1 cluster with 3 nodes, each 2GB heap. The VMs runs with "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC", which works most of the time without problems.
However sometimes a VM does a compacting garbage collection -> this means a stop of 50-80seconds. In this time the node got suspected by the other nodes.
After the node gets responsive again it logs:
server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,385 WARN [org.jgroups.protocols.FD] [T:125798] I was suspected by 10.199.18.13:39310; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,386 WARN [org.jgroups.protocols.FD] [T:125798] I was suspected by 10.199.18.13:39310; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,386 WARN [org.jgroups.protocols.FD] [T:125798] I was suspected by 10.199.18.13:39310; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,387 DEBUG [org.jgroups.protocols.pbcast.FLUSH] [T:127] Received START_FLUSH at 10.199.18.11:45393 but I am not flush participant, not responding
server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,387 DEBUG [org.jgroups.protocols.pbcast.FLUSH] [T:127] Received START_FLUSH at 10.199.18.11:45393 but I am not flush participant, not responding
server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,387 DEBUG [org.jgroups.protocols.pbcast.FLUSH] [T:127] Received START_FLUSH at 10.199.18.11:45393 but I am not flush participant, not responding
server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,388 DEBUG [org.jgroups.protocols.pbcast.GMS] [T:127] view=[10.199.18.12:39800|9] [10.199.18.12:39800, 10.199.18.13:39310]
server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,388 DEBUG [org.jgroups.protocols.pbcast.GMS] [T:127] [local_addr=10.199.18.11:45393] view is [10.199.18.12:39800|9] [10.199.18.12:39800, 10.199.18.13:
39310]
server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,388 WARN [org.jgroups.protocols.pbcast.GMS] [T:127] I (10.199.18.11:45393) am not a member of view [10.199.18.12:39800|9] [10.199.18.12:39800, 10.199
.18.13:39310], shunning myself and leaving the group (prev_members are [10.199.18.12:34166, 10.199.18.13:60923, 10.199.18.11:45393, 10.199.18.12:39800, 10.199.18.13:39310], current view is [10.199.18.1
1:45393|8] [10.199.18.11:45393, 10.199.18.12:39800, 10.199.18.13:39310])
After this the cluster is broken and at least the node with the compacting GC must be restarted, sometimes the whole cluster is broken and must be restarted.
Is there a configuration to avoid such problems ?