14 Replies Latest reply on Dec 14, 2012 10:32 AM by silver06

    Connection timeout to local port 9999 with clustered AS 7.1.1.Final

    jacklund

      I've been setting up a clustered instance of AS 7.1.1.Final on Fedora 17, with one master and one slave in the domain, using https://docs.jboss.org/author/display/AS71/AS7+Cluster+Howto as my guide for doing the setup, and using the Sun java version 1.7.0_05-b05 (I've tried with openjdk 1.7 as well, and still see the problem). What I'm seeing is that there seems to be a race condition with the master instance connecting to its own management port. What I get is the following in the logs:

      [Server:server-two] 16:36:03,492 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-2) MSC00001: Failed to start service jboss.host.controller.client: org.jboss.msc.service.StartException in service jboss.host.controller.client: java.net.ConnectException: JBAS012144: Could not connect to remote://10.108.25.156:9999. The connection timed out

      [Server:server-two]           at org.jboss.as.server.mgmt.domain.HostControllerServerClient.start(HostControllerServerClient.java:161) [jboss-as-server-7.1.1.Final.jar:7.1.1.Final]

      [Server:server-two]           at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1811) [jboss-msc-1.0.2.GA.jar:1.0.2.GA]

      [Server:server-two]           at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1746) [jboss-msc-1.0.2.GA.jar:1.0.2.GA]

      [Server:server-two]           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_05]

      [Server:server-two]           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_05]

      [Server:server-two]           at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_05]

      [Server:server-two] Caused by: java.net.ConnectException: JBAS012144: Could not connect to remote://nephele-beta-jboss1:9999. The connection timed out

      [Server:server-two]           at org.jboss.as.protocol.ProtocolChannelClient.connectSync(ProtocolChannelClient.java:155) [jboss-as-protocol-7.1.1.Final.jar:7.1.1.Final]

      [Server:server-two]           at org.jboss.as.server.mgmt.domain.HostControllerServerConnection.openChannel(HostControllerServerConnection.java:158) [jboss-as-server-7.1.1.Final.jar:7.1.1.Final]

      [Server:server-two]           at org.jboss.as.server.mgmt.domain.HostControllerServerConnection.connect(HostControllerServerConnection.java:86) [jboss-as-server-7.1.1.Final.jar:7.1.1.Final]

      [Server:server-two]           at org.jboss.as.server.mgmt.domain.HostControllerServerClient.start(HostControllerServerClient.java:135) [jboss-as-server-7.1.1.Final.jar:7.1.1.Final]

      [Server:server-two]           ... 5 more

       

      I get this randomly for servers 1, 2 or 3, or a combination. Occasionally, they all come up without the error - but only occasionally! Once I get this for one of the server instances, I can't start it up from the management console - it says the request timed out (which makes sense since it can't connect).

       

      Here's my networking setup from the host.xml file:

       

          <domain-controller>
             <local/>
             <!-- Alternative remote domain controller configuration with a host and port -->
             <!-- <remote host="${jboss.domain.master.address:10.108.25.156}" port="${jboss.domain.master.port:9999}"/> -->
          </domain-controller>
      
      
          <interfaces>
              <interface name="management">
                  <inet-address value="${jboss.bind.address.management:10.108.25.156}"/>
              </interface>
              <interface name="public">
                 <inet-address value="${jboss.bind.address:10.108.25.156}"/>
              </interface>
              <interface name="unsecure">
                  <!-- Used for IIOP sockets in the standard configuration.
                       To secure JacORB you need to setup SSL -->
                  <inet-address value="${jboss.bind.address.unsecure:10.108.25.156}"/>
              </interface>
          </interfaces>
      

       

      I suspect that perhaps there is a race condition here where the various servers are trying to connect to the management interface before it's fully up.

       

      Is there anything I can do to fix this? I've searched around, but can't find any information on either someone else coming across this error, or how to increase the timeout on the management connection.

       

      Please help!

       

      Thanks.

       

      -Jack Lund

        • 1. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
          ctomc

          hi,

           

          can you try stoping firewall on you fedora?

           

          run

          service iptables stop

           

           

          and try again,

          tomaz

          • 2. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
            jacklund

            Tomas,

             

            Thanks for the prompt reply! Unfortunately, it didn't work. I had already set up a firewall rule for incoming connections on 9999, but even turning off iptables didn't change anything - still the same error when I restart jboss.

             

            -Jack

            • 3. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
              jaysensharma

              Hi Jack,

               

                   From your configuration looks like  you have hardcoded the Management interface address to listen only to "10.108.25.156" which means the domain controller will listen only to "10.108.25.156" address.

               

                      <interface name="management">
                          <inet-address value="${jboss.bind.address.management:10.108.25.156}"/>
                      </interface>

               

               

                 So you need to make sure that the Host controller's "host.xml" also points to the same address of Domain Controller in order to communicate.

               

                  <domain-controller>
                     <remote host="${jboss.domain.master.address:10.108.25.156}" port="${jboss.domain.master.port:9999}"/>
                  </domain-controller>

               


              Or If you are planning to start DomainController and HostController in the same machine then try starting them on "localhost" as following:

                ./domain.sh  -Djboss.bind.address.management=localhost

               

              This is because your Host controller is trying to reach Domain Controller on remote://nephele-beta-jboss1:9999 (which seems to be a local address)

               

               

              Additionally if your Host controller is running on a remote machine then you will need to pass the security realm as well in the following tag:

              <domain-controller>
                   <remote host="${jboss.domain.master.address:10.108.25.156}" port="${jboss.domain.master.port:9999}" security-realm="ManagementRealm"/>
              </domain-controller>
              

               

              Refer to the following link:  http://middlewaremagic.com/jboss/?p=1900   to know how to pass the <server-identity>  Absense of this during remote communication also may also cause the ConnectionTimeout

              • 4. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                jacklund

                Hi, Jay.

                 

                Thanks for the response - I really appreciate it! I am running the host and domain controllers on the same box (not really by design, but because the instructions referenced above set it up that way). I went ahead and tried your first suggestion anyway, and got an error where the host controller couldn't connect. When I tried your second suggestion, I got my same old error. Here's the ps output:

                 

                jboss    10797 10794  0 19:50 ?        00:00:00 /bin/sh /usr/local/jboss/bin/domain.sh -Djboss.bind.address.management=localhost -c domain.xml

                jboss    10847 10797  0 19:50 ?        00:00:03 java -D[Process Controller] -server -Xms64m -Xmx512m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Djboss.domain.default.config=domain.xml -Djboss.host.default.config=host.xml -Dorg.jboss.boot.log.file=/usr/local/jboss/domain/log/process-controller.log -Dlogging.configuration=file:/usr/local/jboss/domain/configuration/logging.properties -jar /usr/local/jboss/jboss-modules.jar -mp /usr/local/jboss/modules org.jboss.as.process-controller -jboss-home /usr/local/jboss -jvm java -mp /usr/local/jboss/modules -- -Dorg.jboss.boot.log.file=/usr/local/jboss/domain/log/host-controller.log -Dlogging.configuration=file:/usr/local/jboss/domain/configuration/logging.properties -server -Xms64m -Xmx512m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Djboss.domain.default.config=domain.xml -Djboss.host.default.config=host.xml -- -default-jvm java -Djboss.bind.address.management=localhost -c domain.xml

                jboss    10863 10847  0 19:50 ?        00:00:19 java -D[Host Controller] -Dorg.jboss.boot.log.file=/usr/local/jboss/domain/log/host-controller.log -Dlogging.configuration=file:/usr/local/jboss/domain/configuration/logging.properties -server -Xms64m -Xmx512m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Djboss.domain.default.config=domain.xml -Djboss.host.default.config=host.xml -jar /usr/local/jboss/jboss-modules.jar -mp /usr/local/jboss/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.host-controller -mp /usr/local/jboss/modules --pc-address localhost.localdomain --pc-port 42201 -default-jvm java -Djboss.bind.address.management=localhost -c domain.xml -Djboss.home.dir=/usr/local/jboss

                jboss    10939 10847  1 19:50 ?        00:00:59 /usr/java/jdk1.7.0_05/jre/bin/java -D[Server:server-one] -XX:PermSize=256m -XX:MaxPermSize=256m -Xms64m -Xmx512m -server -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Dsun.rmi.dgc.client.gcInterval=3600000 -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.awt.headless=true -Djboss.host.default.config=host.xml -Djboss.modules.system.pkgs=org.jboss.byteman -Djboss.domain.default.config=domain.xml -Djava.net.preferIPv4Stack=true -D[Host Controller]=true -Djboss.bind.address.management=localhost -Dcom.m2mci.config.dir=/usr/local/etc/Nephele -Djboss.home.dir=/usr/local/jboss -Djboss.server.log.dir=/usr/local/jboss/domain/servers/server-one/log -Djboss.server.temp.dir=/usr/local/jboss/domain/servers/server-one/tmp -Djboss.server.data.dir=/usr/local/jboss/domain/servers/server-one/data -Dorg.jboss.boot.log.file=/usr/local/jboss/domain/servers/server-one/log/boot.log -Dlogging.configuration=file:/usr/local/jboss/domain/configuration/logging.properties -jar /usr/local/jboss/jboss-modules.jar -mp /usr/local/jboss/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.server

                jboss    10944 10847  1 19:50 ?        00:00:58 /usr/java/jdk1.7.0_05/jre/bin/java -D[Server:server-two] -XX:PermSize=256m -XX:MaxPermSize=256m -Xms64m -Xmx512m -server -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Dsun.rmi.dgc.client.gcInterval=3600000 -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.awt.headless=true -Djboss.host.default.config=host.xml -Djboss.modules.system.pkgs=org.jboss.byteman -Djboss.domain.default.config=domain.xml -Djava.net.preferIPv4Stack=true -D[Host Controller]=true -Djboss.bind.address.management=localhost -Dcom.m2mci.config.dir=/usr/local/etc/Nephele -Djboss.home.dir=/usr/local/jboss -Djboss.server.log.dir=/usr/local/jboss/domain/servers/server-two/log -Djboss.server.temp.dir=/usr/local/jboss/domain/servers/server-two/tmp -Djboss.server.data.dir=/usr/local/jboss/domain/servers/server-two/data -Dorg.jboss.boot.log.file=/usr/local/jboss/domain/servers/server-two/log/boot.log -Dlogging.configuration=file:/usr/local/jboss/domain/configuration/logging.properties -jar /usr/local/jboss/jboss-modules.jar -mp /usr/local/jboss/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.server

                jboss    10964 10847  0 19:50 ?        00:00:16 /usr/java/jdk1.7.0_05/jre/bin/java -D[Server:server-three] -XX:PermSize=256m -XX:MaxPermSize=256m -Xms64m -Xmx512m -server -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Dsun.rmi.dgc.client.gcInterval=3600000 -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.awt.headless=true -Djboss.host.default.config=host.xml -Djboss.modules.system.pkgs=org.jboss.byteman -Djboss.domain.default.config=domain.xml -Djava.net.preferIPv4Stack=true -D[Host Controller]=true -Djboss.bind.address.management=localhost -Dcom.m2mci.config.dir=/usr/local/etc/Nephele -Djboss.home.dir=/usr/local/jboss -Djboss.server.log.dir=/usr/local/jboss/domain/servers/server-three/log -Djboss.server.temp.dir=/usr/local/jboss/domain/servers/server-three/tmp -Djboss.server.data.dir=/usr/local/jboss/domain/servers/server-three/data -Dorg.jboss.boot.log.file=/usr/local/jboss/domain/servers/server-three/log/boot.log -Dlogging.configuration=file:/usr/local/jboss/domain/configuration/logging.properties -jar /usr/local/jboss/jboss-modules.jar -mp /usr/local/jboss/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.server

                 

                And the output:

                19:50:51,057 INFO  [org.jboss.as.process.Server:server-three.status] (ProcessController-threads - 4) JBAS012017: Starting process 'Server:server-three'

                [Server:server-three] 19:50:56,840 INFO  [org.jboss.modules] (main) JBoss Modules version 1.1.1.GA

                [Server:server-one] 19:50:57,937 INFO  [org.jboss.modules] (main) JBoss Modules version 1.1.1.GA

                [Server:server-two] 19:50:59,726 INFO  [org.jboss.modules] (main) JBoss Modules version 1.1.1.GA

                [Server:server-three] 19:51:01,130 INFO  [org.jboss.msc] (main) JBoss MSC version 1.0.2.GA

                [Server:server-three] 19:51:02,297 INFO  [org.jboss.as] (MSC service thread 1-1) JBAS015899: JBoss AS 7.1.1.Final "Brontes" starting

                [Server:server-one] 19:51:02,614 INFO  [org.jboss.msc] (main) JBoss MSC version 1.0.2.GA

                [Server:server-one] 19:51:03,856 INFO  [org.jboss.as] (MSC service thread 1-2) JBAS015899: JBoss AS 7.1.1.Final "Brontes" starting

                [Server:server-three] 19:51:04,178 INFO  [org.xnio] (MSC service thread 1-1) XNIO Version 3.0.3.GA

                [Server:server-three] 19:51:04,371 INFO  [org.xnio.nio] (MSC service thread 1-1) XNIO NIO Implementation Version 3.0.3.GA

                [Server:server-three] 19:51:04,507 INFO  [org.jboss.remoting] (MSC service thread 1-1) JBoss Remoting version 3.2.3.GA

                [Server:server-two] 19:51:05,395 INFO  [org.jboss.msc] (main) JBoss MSC version 1.0.2.GA

                [Server:server-one] 19:51:07,105 INFO  [org.xnio] (MSC service thread 1-1) XNIO Version 3.0.3.GA

                [Server:server-two] 19:51:07,486 INFO  [org.jboss.as] (MSC service thread 1-1) JBAS015899: JBoss AS 7.1.1.Final "Brontes" starting

                [Server:server-one] 19:51:07,927 INFO  [org.xnio.nio] (MSC service thread 1-1) XNIO NIO Implementation Version 3.0.3.GA

                [Server:server-one] 19:51:09,378 INFO  [org.jboss.remoting] (MSC service thread 1-1) JBoss Remoting version 3.2.3.GA

                [Server:server-two] 19:51:10,967 INFO  [org.xnio] (MSC service thread 1-1) XNIO Version 3.0.3.GA

                [Server:server-two] 19:51:11,208 INFO  [org.xnio.nio] (MSC service thread 1-1) XNIO NIO Implementation Version 3.0.3.GA

                [Server:server-two] 19:51:11,356 INFO  [org.jboss.remoting] (MSC service thread 1-1) JBoss Remoting version 3.2.3.GA

                [Server:server-three] 19:51:26,014 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-1) MSC00001: Failed to start service jboss.host.controller.client: org.jboss.msc.service.StartException in service jboss.host.controller.client: java.net.ConnectException: JBAS012144: Could not connect to remote://localhost:9999. The connection timed out

                [Server:server-three]           at org.jboss.as.server.mgmt.domain.HostControllerServerClient.start(HostControllerServerClient.java:161) [jboss-as-server-7.1.1.Final.jar:7.1.1.Final]

                [Server:server-three]           at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1811) [jboss-msc-1.0.2.GA.jar:1.0.2.GA]

                [Server:server-three]           at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1746) [jboss-msc-1.0.2.GA.jar:1.0.2.GA]

                [Server:server-three]           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_05]

                [Server:server-three]           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_05]

                [Server:server-three]           at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_05]

                [Server:server-three] Caused by: java.net.ConnectException: JBAS012144: Could not connect to remote://localhost:9999. The connection timed out

                [Server:server-three]           at org.jboss.as.protocol.ProtocolChannelClient.connectSync(ProtocolChannelClient.java:155) [jboss-as-protocol-7.1.1.Final.jar:7.1.1.Final]

                [Server:server-three]           at org.jboss.as.server.mgmt.domain.HostControllerServerConnection.openChannel(HostControllerServerConnection.java:158) [jboss-as-server-7.1.1.Final.jar:7.1.1.Final]

                [Server:server-three]           at org.jboss.as.server.mgmt.domain.HostControllerServerConnection.connect(HostControllerServerConnection.java:86) [jboss-as-server-7.1.1.Final.jar:7.1.1.Final]

                [Server:server-three]           at org.jboss.as.server.mgmt.domain.HostControllerServerClient.start(HostControllerServerClient.java:135) [jboss-as-server-7.1.1.Final.jar:7.1.1.Final]

                [Server:server-three]           ... 5 more

                [Server:server-three]

                 

                And, like before, if I restart it, it pretty much randomly errors out on servers one, two, or three (or multiple of them). I usually have to restart it 10 or 15 times to get them all to come up correctly.

                 

                I don't think it's anything like having the wrong interface configured, because it would probably never connect at all, rather than connecting randomly. This really feels like a race condition of some sort.

                 

                I don't know if this matters as well, but I'm running this on an Amazon EC2 instance running Fedora 17. UDP multicast doesn't work there, but I got the JGroups stuff working using a GossipRouter, and that all seems to work correctly. The EC2 instance I'm running on is pretty low-powered (we're just testing this configuration for now, we'll probably beef it up later).

                 

                I'm going ahead and posting my domain.xml and host.xml files, hopefully there's something glaringly dopey I'm doing in there to cause this.

                 

                Thanks again.

                 

                -Jack

                • 5. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                  sfcoy

                  Hi there,

                   

                  As a "left field" suggestion, can you show us the result of:

                   

                  {code}arp -a{code}

                   

                  to see if there are duplicate IP addresses around.

                  • 6. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                    jacklund

                    Sure.

                    $ sudo arp -a

                    ip-10-108-25-1.ec2.internal (10.108.25.1) at fe:ff:ff:ff:ff:ff [ether] on eth0

                     

                    Thanks.

                     

                    -Jack

                    • 7. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                      jacklund

                      Since it seems like this problem has people stumped, would anyone object strongly to my submitting it as a bug?

                       

                      -Jack

                      • 8. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                        emuckenhuber

                        You can create a jira and assign it to me. It seems like the timeout for the connection is too short when starting a server. I also need to double check that there is not an additional issue causing the it to time out.

                        • 9. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                          jacklund

                          Will do, thanks! Just FYI, this may be kind of a corner case, because we're running this currently on the equivalent of a single-core machine, so it's possible that it would work fine on most "normal" machines with the current timeout. If there was a configuration setting to set the timeout, that might do the job because we could set it higher for our peculiar setup. Just a thought.

                           

                          Thanks again!

                           

                          -Jack

                          • 10. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                            jacklund

                            I created the JIRA here: https://issues.jboss.org/browse/AS7-5132, but it didn't allow me to assign it to anyone (that I could see).

                             

                            -Jack

                            • 11. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                              tknyziak

                              I've stumbled upon the same issue with JBoss AS 7.1.1 Final, a domain controller / master node on Centos 6.2 and a slave node od Windows 7 Pro 64. It seems to me that the slave node registers succesfully to the domain controller, but the actual problem is the server instance running on the slave node cannot connect to its *host* controller - it is simply being injected a random IP address of the slave machine, although I've pointed the management, public and unsecure JBoss interface only to one of them.

                               

                              The workaround I've found is to set the management interface on the slave instance to 0.0.0.0 (I'm doing it via -Djboss.bind.address.management switch, but I believe you can edit the host.xml file as well) - this way, the slave host controller listens on all the IP adresses and the slave server instance starts up successfully in the domain.

                               

                              I know it's neither secure nor elegant, yet it works.

                               

                              IMHO, the server instances should be injected the management JBoss interface, not just the random IP address of the machine.

                               

                              Kind regards

                               

                              Tomasz

                              1 of 1 people found this helpful
                              • 12. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                                tknyziak

                                Just a quick update - I've checked out and compiled AS 7.1.2.Final and it seems not to suffer from this problem anymore (although it may be by chance).

                                 

                                Regards!

                                 

                                Tomasz

                                1 of 1 people found this helpful
                                • 13. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                                  jacklund

                                  Good to know - thanks!

                                   

                                  -Jack

                                  • 14. Re: Connection timeout to local port 9999 with clustered AS 7.1.1.Final
                                    silver06

                                    It's by chance because I'm using EAP 6 (internally JBoss 7.1.2) and the problem is there.
                                    Apparently it's fixed in 7.1.3.Final