6 Replies Latest reply on Mar 1, 2019 3:38 PM by nadirx

    Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)

    soupy989

      Hi All

       

      I have been trying to cluster an embedded spring boot infinispan instance with a remote standalone server node with no luck and was hoping someone here could help.

       

      The error I'm seeing on the infinispan server side (NODE_1):

      17:12:26,408 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: Infinispan Server 9.4.4.Final (WildFly Core 6.0.2.Final) started in 10731ms - Started 274 of 322 services (163 services are lazy, passive

      or on-demand)

      17:19:49,480 INFO  [org.infinispan.CLUSTER] (jgroups-8,{NODE_1}) ISPN000094: Received new cluster view for channel cluster: [{NODE_1}|1] (2) [{NODE_1}, {NODE_2}-52218]

      17:19:49,483 INFO  [org.infinispan.CLUSTER] (jgroups-8,{NODE_1}) ISPN100000: Node {NODE_2}-52218 joined the cluster

      17:20:49,644 WARN  [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p4-t23) ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Ti

      med out waiting for responses for request 1 from {NODE_2}-52218

              at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)

              at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)

              at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)

              at java.util.concurrent.FutureTask.run(FutureTask.java:266)

              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

              at java.lang.Thread.run(Thread.java:748)

              Suppressed: org.infinispan.util.logging.TraceException

              at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)

              at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)

              at java.util.concurrent.FutureTask.run(FutureTask.java:266)

              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

              at java.lang.Thread.run(Thread.java:748)

              Suppressed: org.infinispan.util.logging.TraceException

                      at org.infinispan.remoting.transport.Transport.invokeRemotely(Transport.java:75)

                      at org.infinispan.topology.ClusterTopologyManagerImpl.confirmMembersAvailable(ClusterTopologyManagerImpl.java:525)

                      at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:508)

                      at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:321)

       

      The error seen from Spring Boot (NODE_2):

      17:19:49.171 [main] DEBUG o.i.manager.DefaultCacheManager - Starting cache manager null

      17:19:49.180 [main] INFO  o.i.f.GlobalComponentRegistry - ISPN000128: Infinispan version: Infinispan 'Infinity Minus ONE +2' 9.4.4.Final

      17:19:49.343 [main] INFO  o.i.r.t.jgroups.JGroupsTransport - ISPN000078: Starting JGroups channel cluster

      17:19:49.429 [main] DEBUG org.jgroups.stack.Configurator - set property TCP.diagnostics_addr to default value /224.0.75.75

      17:19:49.434 [main] DEBUG org.jgroups.protocols.TCP - thread pool min/max/keep-alive: 0/200/60000 use_fork_join=false, internal pool: 0/4/30000 (4 cores available)

      17:19:49.439 [main] DEBUG org.jgroups.protocols.MPING - bind_addr=/10.252.132.39, mcast_addr=/234.99.54.14, mcast_port=45700

      17:19:49.447 [main] DEBUG org.jgroups.protocols.MPING - receiver thread started

      17:19:49.452 [main] DEBUG org.jgroups.protocols.pbcast.GMS - address={NODE_2}-52218, cluster=cluster, physical address=10.252.132.39:47500

      17:19:49.471 [main] DEBUG org.jgroups.protocols.pbcast.GMS - {NODE_2}-52218: sending JOIN({NODE_2}-52218) to {NODE_1}

      17:19:49.652 [main] DEBUG o.jgroups.protocols.pbcast.NAKACK2 -

      [rdcdlimdg02-52218 setDigest()]

      existing digest:  []

      new digest:       {NODE_1}: [0 (0)], {NODE_2}-52218: [0 (0)]

      resulting digest: {NODE_1}: [0 (0)], {NODE_2}-52218: [0 (0)]

      17:19:49.652 [main] DEBUG org.jgroups.protocols.pbcast.GMS - {NODE_2}-52218: installing view [{NODE_1}|1] (2) [{NODE_1}, {NODE_2}-52218]

      17:19:49.665 [main] INFO  org.infinispan.CLUSTER - ISPN000094: Received new cluster view for channel cluster: [{NODE_1}|1] (2) [{NODE_1}, {NODE_2}-52218]

      17:19:49.666 [FD_SOCK pinger-11,{NODE_2}-52218] DEBUG org.jgroups.protocols.FD_SOCK - {NODE_2}-52218: pingable_mbrs=[{NODE_1}, {NODE_2}-52218], ping_dest={NODE_1}

      17:19:49.666 [main] DEBUG o.i.r.t.jgroups.JGroupsTransport - Joined: [{NODE_1}, {NODE_2}-52218], Left: []

      17:19:49.744 [main] INFO  o.i.r.t.jgroups.JGroupsTransport - ISPN000079: Channel cluster local address is {NODE_2}-52218, physical addresses are [10.252.132.39:47500]

      17:20:49.756 [main] DEBUG o.i.t.ClusterTopologyManagerImpl - Timed out waiting for rebalancing status from coordinator, trying again

       

      The jgroups xml configuration used from Spring Boot (pulled from 9.4.4-core .jar):

      <config xmlns="urn:org:jgroups"

              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

              xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">

         <TCP bind_addr="${jgroups.tcp.address:127.0.0.1}"

              bind_port="${jgroups.tcp.port:7800}"

              enable_diagnostics="false"

              thread_naming_pattern="pl"

              send_buf_size="640k"

              sock_conn_timeout="1000"

              bundler_type="no-bundler"

       

       

              thread_pool.min_threads="${jgroups.thread_pool.min_threads:0}"

              thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"

              thread_pool.keep_alive_time="60000"

         />

         <TCPPING initial_hosts="{HOST_1}[47500],{HOST_2}[47500]"/>

         <MPING bind_addr="${jgroups.tcp.address:127.0.0.1}"

          mcast_addr="234.99.54.14"

          mcast_port="45700"

          ip_ttl="${jgroups.udp.ip_ttl:2}"/>

         <MERGE3 min_interval="10000"

                 max_interval="30000"

         />

         <FD_SOCK start_port="57600"/>

         <!-- Suspect node `timeout` to `timeout + timeout_check_interval` millis after the last heartbeat -->

         <FD_ALL timeout="10000"

                 interval="2000"

                 timeout_check_interval="1000"

         />

         <VERIFY_SUSPECT timeout="1500"/>

         <pbcast.NAKACK2 use_mcast_xmit="false"

                         xmit_interval="100"

                         xmit_table_num_rows="50"

                         xmit_table_msgs_per_row="1024"

                         xmit_table_max_compaction_time="30000"

                         resend_last_seqno="true"

         />

         <UNICAST3 xmit_interval="100"

                   xmit_table_num_rows="50"

                   xmit_table_msgs_per_row="1024"

                   xmit_table_max_compaction_time="30000"

         />

         <pbcast.STABLE stability_delay="500"

                        desired_avg_gossip="5000"

                        max_bytes="1M"

         />

         <pbcast.GMS print_local_addr="false"

                     join_timeout="${jgroups.join_timeout:5000}"

         />

         <MFC max_credits="2m"

              min_threshold="0.40"

         />

         <FRAG3/>

      </config>

       

      Compared to jgroups configuration used from server:

      <stack name="tcp">

                          <transport type="TCP" socket-binding="jgroups-tcp"/>

                          <protocol type="TCPPING">

                              <property name="initial_hosts">

                                  {HOST_2}[47500],{HOST_1}[47500]

                              </property>

                          </protocol>

                          <protocol type="MPING" socket-binding="jgroups-mping"/>

                          <protocol type="MERGE3"/>

                          <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>

                          <protocol type="FD_ALL"/>

                          <protocol type="VERIFY_SUSPECT"/>

                          <protocol type="pbcast.NAKACK2">

                              <property name="use_mcast_xmit">

                                  false

                              </property>

                          </protocol>

                          <protocol type="UNICAST3"/>

                          <protocol type="pbcast.STABLE"/>

                          <protocol type="pbcast.GMS"/>

                          <protocol type="MFC_NB"/>

                          <protocol type="FRAG3"/>

                      </stack>

       

      Some background:

       

      jgroups.tcp.address and jgroups.tcp.port are set on start through JVM options.

       

      Infinispan Server version - 9.4.4

      infinispan-spring-boot-starter-embedded - 2.1.1

       

      The oddest part about this is that I am able to cluster two standalone infinspan instances with no problem between NODE_1 and NODE_2 (using infinispan TCP config above). I took a look at trace logging for both org.jgroups and org.infinispan but found no details. Watching the UNICAST traffic (in trace logs) between the two, I don't see where a request was dropped - all requests from NODE_1 receive an ACK from node 2 < 60s.

       

       

      Any ideas on cause/resolution would be greatly appreciated.

       

      Thank you.

        • 1. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)
          soupy989

          I was able to cluster two embedded infinispan nodes (spring boot) using the above configs on the two servers with no issue as well.

           

          Seems like the issue is specifically due to clustering embedded/server.

          • 2. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)
            nadirx

            The server jgroups stack has a FORK element which is preventing the cluster from forming. FORK is used in the server because you can have multiple cache-containers piggy-backed on the same jgroups transport.

            The solution is to add FORK at the end of the JGroups stack declaration in the embedded configuration. Also ensure that all nodes use the same cluster name.

            • 3. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)
              soupy989

              Thanks for the info! I don't see a FORK element in the server's tcp stack, is it something that isn't declared?

               

               

               

              After adding a basic <FORK/> to the jgroups config, it seems additional configuration is required:

              "WARN  org.jgroups.protocols.FORK - {NODE_2}-39164: fork-stack for id=cluster not found; discarding message."

               

              I'm still not sure how to configure a fork-stack for id "cluster", I tried the following:

               

              <FORK config=".../forkConfig.xml"/>

               

              with forkConfig.xml:

              <?xml version="1.0"?>

              <fork-stacks xmlns="fork" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                           xsi:schemaLocation="fork fork-stacks.xsd">

                  <fork-stack id="cluster"/>

              </fork-stacks>

               

              But I'm still getting the same warn message ("fork-stack for id=cluster not found, discarding message"). I'm guessing I'll have to configure the full fork stack for id="cluster" (exactly the same as in jgroups.xml), trying that now.

              • 4. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)
                soupy989

                Configuring the <fork-stack id="cluster"> with the full tcp stack from jgroups got rid of the new error (fork-stack for id=cluster), but I'm running in to the same issue seen before (server side reports timeout due to no response from embedded node, embedded node reports a timeout on rebalance).

                • 5. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)
                  soupy989

                  I updated my embedded jgroups config to :

                  <config xmlns="urn:org:jgroups"

                          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                          xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">

                     <TCP bind_addr="${jgroups.tcp.address:127.0.0.1}"

                          bind_port="${jgroups.tcp.port:7800}"

                          enable_diagnostics="false"

                          thread_naming_pattern="pl"

                          send_buf_size="640k"

                          sock_conn_timeout="1000"

                          bundler_type="no-bundler"

                   

                          thread_pool.min_threads="${jgroups.thread_pool.min_threads:0}"

                          thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"

                          thread_pool.keep_alive_time="60000"

                     />

                     <TCPPING initial_hosts="10.252.132.39[47500],10.252.132.38[47500]"/>

                     <MPING bind_addr="${jgroups.tcp.address:127.0.0.1}"

                      mcast_addr="234.99.54.14"

                      mcast_port="45700"

                      ip_ttl="${jgroups.udp.ip_ttl:2}"/>

                     <MERGE3 min_interval="10000"

                             max_interval="30000"

                     />

                     <FD_SOCK start_port="57600"/>

                     <!-- Suspect node `timeout` to `timeout + timeout_check_interval` millis after the last heartbeat -->

                     <FD_ALL timeout="10000"

                             interval="2000"

                             timeout_check_interval="1000"

                     />

                     <VERIFY_SUSPECT timeout="1500"/>

                     <pbcast.NAKACK2 use_mcast_xmit="false"

                                     xmit_interval="100"

                                     xmit_table_num_rows="50"

                                     xmit_table_msgs_per_row="1024"

                                     xmit_table_max_compaction_time="30000"

                                     resend_last_seqno="true"

                     />

                     <UNICAST3 xmit_interval="100"

                               xmit_table_num_rows="50"

                               xmit_table_msgs_per_row="1024"

                               xmit_table_max_compaction_time="30000"

                     />

                     <pbcast.STABLE stability_delay="500"

                                    desired_avg_gossip="5000"

                                    max_bytes="1M"

                     />

                     <pbcast.GMS print_local_addr="false"

                                 join_timeout="${jgroups.join_timeout:5000}"

                     />

                     <MFC max_credits="2m"

                          min_threshold="0.40"

                     />

                     <FRAG3/>

                     <FORK config="/DATAGRID/microservices/watp/forkConfig/forkConfig.xml"/>

                  </config>

                   

                  With forkConfig.xml as:

                   

                  <fork-stacks>

                      <fork-stack id="cluster">

                            <config/>

                      </fork-stack>

                  </fork-stacks>

                   

                  And I'm now receiving the following message in logs:

                  12:24:38.513 [jgroups-4,rdcdlimdg02-30019] WARN  org.jgroups.protocols.FORK - rdcdlimdg02-30019: fork-channel for id=clustered not found; discarding message

                   

                   

                  It seems the "clustered" channel (from <cache-container name="clustered" default-cache="default" statistics="true">) isn't getting forked as I'd expect from the "cluster" channel on the embedded node. The server seems to create this forked channel automatically based on the cache-container name. I'm not sure how to fix this besides potentially switching to a programmatic declaration of jgroups/infinispan & manually creating the forked channel. Any idea what could cause this?

                   

                  Thanks,
                  Jonathan

                  • 6. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)
                    nadirx