Version 10

    Using probe.sh/bat to discover all clusters in a network

     

    Every JGroups node by default listens on a multicast address/port for diagnostics messages, and replies with information about its state, e.g. the JGroups version in use, its address, the current membership and the protocol stack configuration.

    JGroups has a probe.sh/bat script in its ./bin directory; the script is also provided in the server's ./bin directory as of JBossAS 4.0.4.

     

    Probe can be invoked directly as follows:

    • Add jgroups-all.jar to the CLASSPATH

    • Invoke probe:

       
      bela@laptop /cygdrive/c
      $ java org.jgroups.tests.Probe -help
      Probe [-help] [-addr <addr>] [-bind_addr <addr>] [-port <port>] [-ttl <ttl>] [-t
      imeout <timeout>] -query <query>
      

       

     

    Note: Since JGroups 2.4.4, 2.6.4 and 2.7.x, diagnostics multicast address has changed from 224.0.0.75 to 224.0.75.75. See JGRP-820 for more information.

     

    The options are as follows:

    • -addr: the multicast address, e.g. 224.0.75.75

    • -bind_addr: for multihomed hosts, the address of the interface to bind to, e.g. 192.168.5.1

    • -port: the multicast port of the group to which to send multicasts, default is 7500

    • -ttl: the time-to-live (in hops): the greater the more hosts will you reach

    • -timeout: how long to wait (in seconds)

    • -query: a string that defines a subsystem to be listed, e.g. "jmx" or "props". Example: -query "jmx" -query "props"

     

    Probe multicasts a packet to 224.0.75.75:7500 and waits for responses until the timeout has elapsed.

     

    Since every node listens on the 224.0.75.75:7500 address for multicasts, Probe will discover all clusters in a network (depending on the TTL) ! This can be disabled by setting

    enable_diagnostics

    in TP to false.

     

     

    In the example below, we had 2 Draw demos running on UDP in the same group, on the same multicast address, and 1 perf.Test running on TCP:

     

    bela@laptop /cygdrive/c
    $ java -cp JGroups/dist/jgroups-all.jar org.jgroups.tests.Probe -timeout 500
    
    -- send probe on /224.0.75.75:7500
    
    
    #1 (263 bytes): 192.168.5.1:2222 (DrawGroupDemo)
    local_addr=192.168.5.1:2222
    group_name=DrawGroupDemo
    Version=2.2.9 beta, cvs="$Id: Version.java,v 1.23 2005/09/01 12:08:44 belaban Ex
    p $"
    view: [192.168.5.1:2222|1] [192.168.5.1:2222, 192.168.5.1:2226]
    group_addr=226.6.6.6:12345
    
    
    #2 (263 bytes): 192.168.5.1:2226 (DrawGroupDemo)
    local_addr=192.168.5.1:2226
    group_name=DrawGroupDemo
    Version=2.2.9 beta, cvs="$Id: Version.java,v 1.23 2005/09/01 12:08:44 belaban Ex
    p $"
    view: [192.168.5.1:2222|1] [192.168.5.1:2222, 192.168.5.1:2226]
    group_addr=226.6.6.6:12345
    
    
    #3 (327 bytes): 192.168.5.1:7800 (DrawGroupDemo)
    local_addr=192.168.5.1:7800
    group_name=DrawGroupDemo
    Version=2.2.9 beta, cvs="$Id: Version.java,v 1.23 2005/09/01 12:08:44 belaban Ex
    p $"
    view: [192.168.5.1:7800|0] [192.168.5.1:7800]
    connections: connections (1):
    key: 192.168.5.1:7800: <192.168.5.1:2235 --> 192.168.5.1:7800> (46 secs old)
    

     

     

    In the example, 3 members responded. The first response is from the member listening on port 2222 of 192.168.5.1, the second from 2226. The first and second response come from members in the same cluster, the third response comes from a singleton member, running on a TCP-based configuration stack on host 192.168.5.1 at port 7800. Every member is running JGroups version 2.2.9beta.

     

    Let's now query the JMX statistics and the protocol stacks of all 3 JGroups nodes:

     

    bela@laptop /cygdrive/c
    $ java -cp JGroups/dist/jgroups-all.jar org.jgroups.tests.Probe -timeout 500 -q
    uery jmx -query props
    
    -- send probe on /224.0.75.75:7500
    
    
    #1 (2601 bytes): 192.168.5.1:2226 (DrawGroupDemo)
    local_addr=192.168.5.1:2226
    group_name=DrawGroupDemo
    Version=2.2.9 beta, cvs="$Id: Version.java,v 1.23 2005/09/01 12:08:44 belaban Ex
    p $"
    view: [192.168.5.1:2222|1] [192.168.5.1:2222, 192.168.5.1:2226]
    group_addr=226.6.6.6:12345
    stats:
    UNICAST={num_bytes_sent=0, num_xmit_requests_received=1, num_acks_sent=1, num_ms
    gs_sent=2, num_acks_received=3, num_msgs_received=1, num_bytes_received=0}
    NAKACK={xmit_rsps_received=0, xmit_rsps_sent=0, missing_msgs_received=0, xmit_re
    qs_sent=0, sent_msgs=[50 - 55], received_msgs=192.168.5.1:2226: received_msgs: [
    ], delivered_msgs: [null - null]
    192.168.5.1:2222: received_msgs: [], delivered_msgs: [null - null]
    , xmit_reqs_received=0}
    FC={num_blockings=0, num_replenishments=0, senders=192.168.5.1:2226: 1994950
    192.168.5.1:2222: 1994950
    , total_time_blocked=0, receivers=192.168.5.1:2222: 1995859
    192.168.5.1:2226: 1994950
    , avg_time_blocked=0.0}
    UDP={num_bytes_sent=10742, num_msgs_sent=23, num_msgs_received=119, num_bytes_re
    ceived=9191}
    channel={received_bytes=9191, sent_msgs=50, received_msgs=91, sent_bytes=5050}
    
    props:
    <config>
      <UDP mcast_port="12345"
           discard_incompatible_packets="true"
           mcast_recv_buf_size="25000000"
           mcast_send_buf_size="640000"
           enable_bundling="true"
           max_bundle_size="64000"
           use_outgoing_packet_handler="true"
           down_thread="false"
           mcast_addr="226.6.6.6"
           use_incoming_packet_handler="true"
           loopback="false"
           up_thread="false"
           ucast_recv_buf_size="20000000"
           ucast_send_buf_size="640000"
           ip_ttl="2"
           max_bundle_timeout="30" ></UDP>
      <PING num_initial_members="3"
           up_thread="false"
           timeout="2000"
           down_thread="false" ></PING>
      <FD_SOCK up_thread="false"
           down_thread="false" ></FD_SOCK>
      <NAKACK max_xmit_size="60000"
           up_thread="false"
           retransmit_timeout="100,200,300,600,1200,2400,4800"
           use_mcast_xmit="false"
           discard_delivered_msgs="true"
           down_thread="false"
           gc_lag="1" ></NAKACK>
      <UNICAST up_thread="false"
           timeout="300,600,1200,2400,3600"
           down_thread="false" ></UNICAST>
      <STABLE max_bytes="1000000"
           up_thread="false"
           stability_delay="1000"
           desired_avg_gossip="50000"
           down_thread="false" ></STABLE>
      <GMS shun="true"
           print_local_addr="true"
           up_thread="false"
           join_timeout="3000"
           join_retry_timeout="2000"
           down_thread="false" ></GMS>
      <FC min_threshold="0.1"
           up_thread="false"
           down_thread="false"
           max_credits="2000000" ></FC>
    </config>
    
    #2 (2600 bytes): 192.168.5.1:2222 (DrawGroupDemo)
    local_addr=192.168.5.1:2222
    group_name=DrawGroupDemo
    Version=2.2.9 beta, cvs="$Id: Version.java,v 1.23 2005/09/01 12:08:44 belaban Ex
    p $"
    view: [192.168.5.1:2222|1] [192.168.5.1:2222, 192.168.5.1:2226]
    group_addr=226.6.6.6:12345
    stats:
    UNICAST={num_bytes_sent=0, num_xmit_requests_received=0, num_acks_sent=4, num_ms
    gs_sent=2, num_acks_received=2, num_msgs_received=4, num_bytes_received=0}
    NAKACK={xmit_rsps_received=0, xmit_rsps_sent=0, missing_msgs_received=0, xmit_re
    qs_sent=0, sent_msgs=[43 - 45], received_msgs=192.168.5.1:2226: received_msgs: [
    ], delivered_msgs: [null - null]
    192.168.5.1:2222: received_msgs: [], delivered_msgs: [null - null]
    , xmit_reqs_received=0}
    FC={num_blockings=0, num_replenishments=0, senders=192.168.5.1:2226: 1995859
    192.168.5.1:2222: 1995859
    , total_time_blocked=0, receivers=192.168.5.1:2222: 1995859
    192.168.5.1:2226: 1994950
    , avg_time_blocked=0.0}
    UDP={num_bytes_sent=9366, num_msgs_sent=27, num_msgs_received=120, num_bytes_rec
    eived=9191}
    channel={received_bytes=9191, sent_msgs=41, received_msgs=91, sent_bytes=4141}
    
    props:
    <config>
      <UDP mcast_port="12345"
           discard_incompatible_packets="true"
           mcast_recv_buf_size="25000000"
           mcast_send_buf_size="640000"
           enable_bundling="true"
           max_bundle_size="64000"
           use_outgoing_packet_handler="true"
           down_thread="false"
           mcast_addr="226.6.6.6"
           use_incoming_packet_handler="true"
           loopback="false"
           up_thread="false"
           ucast_recv_buf_size="20000000"
           ucast_send_buf_size="640000"
           ip_ttl="2"
           max_bundle_timeout="30" ></UDP>
      <PING num_initial_members="3"
           up_thread="false"
           timeout="2000"
           down_thread="false" ></PING>
      <FD_SOCK up_thread="false"
           down_thread="false" ></FD_SOCK>
      <NAKACK max_xmit_size="60000"
           up_thread="false"
           retransmit_timeout="100,200,300,600,1200,2400,4800"
           use_mcast_xmit="false"
           discard_delivered_msgs="true"
           down_thread="false"
           gc_lag="1" ></NAKACK>
      <UNICAST up_thread="false"
           timeout="300,600,1200,2400,3600"
           down_thread="false" ></UNICAST>
      <STABLE max_bytes="1000000"
           up_thread="false"
           stability_delay="1000"
           desired_avg_gossip="50000"
           down_thread="false" ></STABLE>
      <GMS shun="true"
           print_local_addr="true"
           up_thread="false"
           join_timeout="3000"
           join_retry_timeout="2000"
           down_thread="false" ></GMS>
      <FC min_threshold="0.1"
           up_thread="false"
           down_thread="false"
           max_credits="2000000" ></FC>
    </config>
    
    #3 (2596 bytes): 192.168.5.1:7800 (DrawGroupDemo)
    local_addr=192.168.5.1:7800
    group_name=DrawGroupDemo
    Version=2.2.9 beta, cvs="$Id: Version.java,v 1.23 2005/09/01 12:08:44 belaban Ex
    p $"
    view: [192.168.5.1:7800|0] [192.168.5.1:7800]
    connections: connections (1):
    key: 192.168.5.1:7800: <192.168.5.1:2235 --> 192.168.5.1:7800> (6 secs old)
    
    
    stats:
    UNICAST={num_bytes_sent=0, num_xmit_requests_received=0, num_acks_sent=0, num_ms
    gs_sent=0, num_acks_received=0, num_msgs_received=0, num_bytes_received=0}
    NAKACK={xmit_rsps_received=0, xmit_rsps_sent=0, missing_msgs_received=0, xmit_re
    qs_sent=0, sent_msgs=[53 - 57], received_msgs=192.168.5.1:7800: received_msgs: [
    ], delivered_msgs: [null - null]
    , xmit_reqs_received=0}
    TCP={num_bytes_sent=9771, num_msgs_sent=21, num_msgs_received=59, num_bytes_rece
    ived=4848}
    FC={num_blockings=0, num_replenishments=0, senders=192.168.5.1:7800: 1995152
    , total_time_blocked=0, receivers=192.168.5.1:7800: 1995152
    , avg_time_blocked=0.0}
    channel={received_bytes=4848, sent_msgs=48, received_msgs=48, sent_bytes=4848}
    
    props:
    <config>
      <TCP
           discard_incompatible_packets="true"
           sock_conn_timeout="500"
           enable_bundling="true"
           bind_addr="192.168.5.1"
           max_bundle_size="64000"
           use_outgoing_packet_handler="true"
           use_send_queues="false"
           down_thread="false"
           start_port="7800"
           recv_buf_size="25000000"
           send_buf_size="640000"
           use_incoming_packet_handler="true"
           loopback="false"
           up_thread="false"
           max_bundle_timeout="30" ></TCP>
      <MPING num_initial_members="3"
           up_thread="false"
           mcast_addr="226.6.6.6"
           timeout="2000"
           down_thread="false"
           mcast_port="7500" ></MPING>
      <FD_SOCK up_thread="false"
           down_thread="false" ></FD_SOCK>
      <NAKACK max_xmit_size="60000"
           up_thread="false"
           retransmit_timeout="100,200,300,600,1200,2400,4800"
           use_mcast_xmit="false"
           discard_delivered_msgs="true"
           down_thread="false"
           gc_lag="0" ></NAKACK>
      <UNICAST up_thread="false"
           timeout="300,600,1200,2400,3600"
           down_thread="false" ></UNICAST>
      <STABLE max_bytes="1000000"
           up_thread="false"
           stability_delay="1000"
           desired_avg_gossip="50000"
           down_thread="false" ></STABLE>
      <GMS shun="true"
           print_local_addr="true"
           up_thread="false"
           join_timeout="3000"
           join_retry_timeout="2000"
           down_thread="false" ></GMS>
      <FC min_threshold="0.1"
           up_thread="false"
           max_block_time="1000"
           down_thread="false"
           max_credits="2000000" ></FC>
    </config>
    
    bela@laptop /cygdrive/c
    
    
    

     

    This is of course a lot of information, so it makes sense to pipe the Probe output into a file

     

    Changes to probe in 2.6.x and higher

     

    In 2.6.x, probe.sh was changed to accept random keys and values which are multicast to all nodes reachable via IP multicasts. probe.sh is now invoked as follows:

    [mac] /Users/bela/JGroups/bin$ ./probe.sh -timeout 200 jmx=UDP

     

    -- send probe on /224.0.75.75:7500

     


    #1 (231 bytes):
    view=[192.168.1.3:50072|0] [192.168.1.3:50072]
    member=192.168.1.3:50072 (DrawGroupDemo)
    local_addr=192.168.1.3:50072
    version=2.7.0.ALPHA1, cvs="$Id: Version.java,v 1.63 2008/09/25 11:57:54 belaban Exp $"
    cluster=DrawGroupDemo

     

     

    [mac] /Users/bela/JGroups/bin$

     

    The above example shows that we can pass an arbitrary number of keys and values to probe.sh, in the example it is key "jmx" with value "UDP". In this specific example this means that we want to fetch JMX information about the UDP protocol from all nodes in the network. If we specified only "jmx" without value, we'd have gotten JMX information about all protocols.

     

    On the server side, we now have ProbeHandlers which can register themselves with the transport. A ProbeHandler looks as follows:

     public interface ProbeHandler {
            /**
             * Handles a probe. For each key that is handled, the key and its result should be in the returned map.
             * @param keys
             * @return Map<String,String>. A map of keys and values. A null return value is permissible.
             */
            Map<String,String> handleProbe(String... keys);
    
            /** Returns a list of supported keys */
            String[] supportedKeys();
        }
    

    The handleProbe() method is passed the keys (and values) passed to probe.sh. Each handler needs to check whether it implements a given key and - if so - return a hashmap with the key and value(s) corresponding to the input key.

     

    Registering a ProbeHandler is simple:

    JChannel ch;
    ProbeHandler handler;
    ch.getProtocolStack().getTransport().registerProbeHandler(handler);
    

     

    The advantage of this new approach is that anyone can now register ProbeHandler under a given KEY and query that information regarding KEY using probe.sh.

    For example, the JBoss Clustering code could decide to register a ProbeHandler which handles key "SESSION-INFO". When it is invoked with that key, it could return (as values in the hashmap) information regarding (a) active HTTP sessions, (b) total number of session attributes, (c) total number of session created, deleted and so on.

     

    Caveat: currently all of this information is sent back to probe in a UDP datagram packet, so the total number of bytes generated by all probe handlers cannot exceed 65'000 bytes.