JGroupsFAQ

Q. Joining a cluster fails

 

Join can fail for several reasons, e.g. the coordinator crashed just before the new member joined.

When FD is used, it will take some time to discover the failed coord. A client will loop in the JOIN until the new coord has taken over. To troubleshoot this, it is useful to have:

  • a stack trace of the joiner and the coordinator (oldest member, who handles the JOIN)

  • logs: org.jgroups at the TRACE level, see JGroups Logging

  • the output of probe (see Probe protocol)

 

Q. What is this shunning thing, and should I turn it on or off ?

 

Check the page on shunning for details

 

Q. What is the version of JGroups ?

 

Execute from the command line:

java -cp jgroups.jar org.jgroups.Version

or

java -jar jgroups.jar

 

Q. Can JGroups bind to 0.0.0.0 ?

 

A bind address of 0.0.0.0 works for other JBoss services, but not for JGroups because the address plus port constitute the identity of a JGroups node, and 0.0.0.0 is not a valid address

 

Q. How do I bind TCP sockets to all interfaces?

 

In order to bind to all network interfaces:

  • remove bind_addr

  • add receive_interfaces (see example below) and

  • set -Dignore.bind.address=true, so that the values set in the XML files are taken

 

Example:

<TCP receive_interfaces="192.168.5.2,192.168.0.2" start_port="7800"  ...></TCP>

 

Q. How does a JGroups transport protocol decide which address to bind to ?

 

There's two ways in which the bind address can be specified:

  • Using the

    bind.address

    system property

  • Specifying the

    bind_addr

    XML attribute in any of the transport protocols

The system property always overrides the XML property, unless you use the system property

-Dignore.bind.address=true

(added at added at JGroups 2.2.8). Then it will use the bind_addr value from the config XML file.

 

Q. How do bind all JBoss services to the same IP address but have the JGroups traffic going over a different network (i.e. clustering)?

 

For JGroups 2.2.8 and later:

Use -Dignore.bind.address=true and the bind_addr attribute in the protocol stack config  -- JGroup will now ignore the -b switch and use the bind_addr

 

For JGroups 2.2.7 and earlier:

In this case, you can't use ignore.bind.address because it was added in 2.2.8. Therefore, you would need to start AS something like this:

 $ run.sh -b 192.168.1.10 -Dbind.address=10.0.0.10 

Note that for releases prior to 4.0.5.GA the -Dbind.address part must come after the -b part.  When the AS parses the command line args, the -b switch sets two system properties -- jboss.bind.address and bind.address.  JGroups uses the latter.  If you specifically set bind.address after -b is parsed, that value will be preserved and JGroups will use it. For 4.0.5.GA and later it doesn't matter in what order you specify things; if you set -Dbind.address, the value you pass will be used by JGroups.

 

Q. Merging does not occur even though Shunning is disabled, what could be the problem ?

 

If you are using TCPPING, check initial_hosts attribute as explained in MERGE2 protocol.

 

Q. I get an "java.net.BindException: Cannot assign requested address exception"

 

If you get an exception like this, then, switch to IPv4 with -Djava.net.preferIPv4Stack=true. This is quite likely due to trying to use IPv6 in Linux but Sun's JDK has a bug that won't be fix until Java 6. See IPv6

Caused by: java.lang.Exception: problem creating sockets (bind_addr=/fe80:0:0:0:217:a4ff:fe10:3ee7%3, mcast_addr=null)
at org.jgroups.protocols.UDP.start(UDP.java:372)
at org.jgroups.stack.Protocol.handleSpecialDownEvent(Protocol.java:589)
... 1 more
Caused by: java.net.BindException: Cannot assign requested address
at java.net.PlainDatagramSocketImpl.bind0(Native Method)
at java.net.PlainDatagramSocketImpl.bind(PlainDatagramSocketImpl.java:82)
at java.net.DatagramSocket.bind(DatagramSocket.java:368)
at java.net.DatagramSocket.<init>(DatagramSocket.java:210)
at java.net.DatagramSocket.<init>(DatagramSocket.java:261)
at org.jgroups.protocols.UDP.createEphemeralDatagramSocket(UDP.java:572)
at org.jgroups.protocols.UDP.createSockets(UDP.java:436)
at org.jgroups.protocols.UDP.start(UDP.java:367)

 

Q. During a load test, I get: "ERROR org.jgroups.blocks.GroupRequest both corr and transport are null, cannot send group request", how do I get around it?

 

 

This error message can appear under high load. Upping the FD timeout setting should make this error dissapear. In fact, it's recommended to have an combined FD and FD_SOCK failure detection mechanism as explained in FDVersusFD_SOCK, with a high FD timeout.

 

 

Referenced by: