10 Replies Latest reply on Jan 9, 2012 3:07 AM by jfclere

    Worker in error state after a request timeout

    asyomichev

      Hello,

       

      I am trying to understand routing behavior of mod_cluster 1.1.0.Final (Tomcat 6.0.20 with AJP connector, Apache/2.2.15) in the presence of requests that are taking a very long time.

       

      I am seeing that requests time out after 5 minutes of no response. I am not sure where this timeout is set (found no references to a 300 second default anywhere in the docs), but it is not a big deal for me, as 5 minutes is a reasonable (even generous) boundary in my opinion.

       

      What seems to happen after the request timeout is more curious: the whole worker becomes unavailable for subsequent request routing for 20-30 seconds. This is easy to demonstrate in a configuration with a single worker: when it starts, up to 5 idle AJP connections are created to the worker. As I submit a long running request, and while it is pending, the worker responds to short requests normally (after all, only one AJP connection is tied up). As I continue to ping the application with short requests, they succeed up until the long request times out. From that point on and for a period of 20-30 seconds (time varies quite a bit, in some experiments from 5 up to 60 seconds), every request to this context returns with "503 Service Temporarily Unavailable" as if the worker was down.

       

      Questions:

      - At the point of timeout, the worker is still up and there are 4 remaining AJP connections from httpd to tomcat that could continue serving requests. What is the rationale behind marking the whole worker in "error" state?

      - What is the recovery mechanism in this case? Why is it taking tens of seconds?

      - I see no related messages in error_log at "debug" level. Are there any other ways to track this behavior?

       

      I am attaching a trivial war file with a "sleep" jsp and a shell script to trigger it; hopefully it will help to reproduce the condition easily. The client script submits a long wait and, after that returns, prints a timestamp and submits a series of small pings until the first successful one. What I would hope to see is that the very first ping succeeds very shortly (tens of milliseconds) after the long request have timed out with a 500. What I see instead is that a number of subsequent pings return 503s one after another, and once a ping succeeds, the final timestamp is about 30 seconds after the first 500 response.

       

      Many thanks for help.

      --Alexey

        • 1. Worker in error state after a request timeout
          jfclere

          the 5 minutes comes from httpd proxy configuration you set your own value using nodeTimeout (See http://docs.jboss.org/mod_cluster/1.0.0/html/javaproperties.html#d0e1131).

           

          The long request causes an error, then worker is marked broken and the next STATUS message from the node mark is marked to OK.

          The STATUS messages are sent every 30 seconds.

           

          If you see PING that failed I think that is an issue on the node and it would be nice to have the trace to confirm that.  MODCLUSTER-133 was a related problem but it is fixed in 1.1.0.

          • 2. Worker in error state after a request timeout
            asyomichev

            Thanks for the "nodeTimeout" pointer! There may be a documentation issue here: the default is documented at -1 (none), but in practice it seems to default to 300 seconds.

             

            I don't think there is ever a ping error, at least I see nothing indicating that in error_log, even at "debug" level. The log snippet below suggests that there are successfull pings to the same instance between the timeout and the next STATUS (PINGs are performed over the remaining AJP connections that are still open):

             

            [Fri Mar 18 11:14:44 2011] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header

            [Fri Mar 18 11:14:44 2011] [error] ajp_read_header: ajp_ilink_receive failed

            [Fri Mar 18 11:14:44 2011] [error] (120006)APR does not understand this error code: proxy: read response failed from 169.70.169.12:33832 (169.70.169.12)

            [Fri Mar 18 11:14:44 2011] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state

            ...

            [Fri Mar 18 11:14:44 2011] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state

            [Fri Mar 18 11:14:47 2011] [debug] proxy_util.c(2011): proxy: ajp: has acquired connection for (169.70.169.12)

            [Fri Mar 18 11:14:47 2011] [debug] proxy_util.c(2067): proxy: connecting ajp://169.70.169.12:43530/ to 169.70.169.12:43530

            [Fri Mar 18 11:14:47 2011] [debug] proxy_util.c(2193): proxy: connected ajp://169.70.169.12:43530/ to 169.70.169.12:43530

            [Fri Mar 18 11:14:47 2011] [debug] mod_proxy_cluster.c(703): ajp_cping_cpong: Done

            [Fri Mar 18 11:14:44 2011] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state

            ...

             

            I think what happens is that the worker is marked bad on the basis of a single connection timeout, even though other connections are still good and pings are still succeeding. Is there a configuration option to treat ping/pong error as the only reason to mark the worker bad, essentially ignoring individual connection timeouts?

             

            Thank you,

            --Alexey

            • 3. Worker in error state after a request timeout
              asyomichev

              After some more digging I understand how the worker is getting marked bad: the logic is in proxy_handler (mod_proxy.c):

               

                    else if (access_status == HTTP_INTERNAL_SERVER_ERROR) {

                          /* Unrecoverable server error.

                           * We can not failover to another worker.

                           * Mark the worker as unusable if member of load balancer

                           */

                          if (balancer) {

                              worker->s->status |= PROXY_WORKER_IN_ERROR;

                              worker->s->error_time = apr_time_now();

                          }

                          break;

                      }

               

              As long as AJP timeout is bubbled up as HTTP_INTERNAL_SERVER_ERROR, the worker is marked bad, and the request is reported as a 500 failure to the client. A more graceful way of handling the AJP header read timeout would be to report it as something less severe, for example HTTP_GATEWAY_TIME_OUT, which is still reported as a failure to the client, but has not effect on worker status. To try it out, I made a change in ap_proxy_ajp_request (mod_proxy_ajp.c):

               

                  /* read the response */

                  conn->data = NULL;

                  status = ajp_read_header(conn->sock, r, maxsize,

                                           (ajp_msg_t **)&(conn->data));

                  if (status != APR_SUCCESS) {

                      /* We had a failure: Close connection to backend */

                      conn->close++;

                      apr_brigade_destroy(input_brigade);

                      ap_log_error(APLOG_MARK, APLOG_ERR, status, r->server,

                                   "proxy: read response failed from %pI (%s)",

                                   conn->worker->cp->addr,

                                   conn->worker->hostname);

                      /*

                       * This is only non fatal when we have not sent (parts) of a possible

                       * request body so far (we do not store it and thus cannot sent it

                       * again) and the method is idempotent. In this case we can dare to

                       * retry it with a different worker if we are a balancer member.

                       */

                      if (!send_body && (is_idempotent(r) == METHOD_IDEMPOTENT)) {

                          return HTTP_SERVICE_UNAVAILABLE;

                      }

               

                      return HTTP_GATEWAY_TIME_OUT;

                  }

               

              This seems to do the trick - a request longer than "nodeTimeout" is reported as a 504, but the worker can continue serve other requests normally in parallel to the long request and after it times out, with no interruptions. My understanding is that a true worker hang would be detected regardless of this change, by the ping/pong probe.

               

              Are there any hidden implications of such change I am missing?

               

               

              Thank you,
              --Alexey

              • 4. Worker in error state after a request timeout
                jfclere

                It needs a more complex patch which should go in Apache httpd, for the moment I think that apr_wait_for_io_or_timeout() returns APR_TIMEUP and only that cause shoud cause HTTP_GATEWAY_TIME_OUT.

                • 5. Worker in error state after a request timeout
                  asyomichev

                  Agreed. The full patch that I made in the end touches a couple more files:

                   

                  ajp.h  at line 115:


                  #define AJP_ELOGFAIL            (APR_OS_START_USERERR + 9)

                  /** Bad requestmethod */

                  #defineAJP_EBAD_METHOD        (APR_OS_START_USERERR + 10)

                  +/** Send/receive timeout */

                  +#define AJP_ETIMEOUT            (APR_OS_START_USERERR + 10)

                   

                   

                  ajp_link.c  at line 95:

                   

                       if (status !=APR_SUCCESS) {

                          ap_log_error(APLOG_MARK, APLOG_ERR, status, NULL,

                                       "ajp_ilink_receive() can't receive header");

                  -        returnAJP_ENO_HEADER;

                  +        return(status == APR_TIMEUP ? AJP_ETIMEOUT : AJP_ENO_HEADER);

                       }

                   

                   

                   

                  mod_proxy_ajp.c at line 343 :

                   

                           if(!send_body && (is_idempotent(r) == METHOD_IDEMPOTENT)) {

                               returnHTTP_SERVICE_UNAVAILABLE;

                           }

                  +        if (status ==AJP_ETIMEOUT)

                  +        {

                  +               /*Connection may time out due to a long-running request, which

                  +                *means the worker deserves a second chance. Returning a 500 here

                  +                * wouldbring the worker to error status, a mild 504 will keep it

                  +                *usable for subsequent requests.

                  +                */

                  +            return HTTP_GATEWAY_TIME_OUT;

                  +        }

                           return HTTP_INTERNAL_SERVER_ERROR;

                       }

                   

                  Do you think this is sufficiently selective? It definitely made a marked improvement in my case, so hopefully you could make a change like this in one of next releases of mod_cluster.

                   

                  Thank you,
                  --Alexey

                  • 6. Worker in error state after a request timeout
                    jfclere

                    Could you please attach a patch to the JIRA MODCLUSTER-226?

                    • 7. Re: Worker in error state after a request timeout
                      hareeshram

                      Hi,

                       

                      So right now, if nodeTimeout is not set explicitly, which value  and from which file will be considered as reason for Gateway timeouts?

                       

                      [Thu Jan 05 10:30:02 2012] [error] ajp_read_header: ajp_ilink_receive failed

                      [Thu Jan 05 10:30:02 2012] [error] (70007)The timeout specified has expired: proxy: read response failed from x.x.x.x:8009 (x.x.x.x)

                      [Thu Jan 05 11:33:05 2012] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header

                      Write failed: Broken pipe] [error] ajp_read_header: ajp_ilink_receive failed

                       

                      How about setting these values for some requests which may take around 60 to 90 seconds..

                      • 8. Re: Worker in error state after a request timeout
                        jfclere

                        the ProxyTimeout timeout will be used it default to the Timeout server timeout that default to 300 seconds.

                        • 9. Re: Worker in error state after a request timeout
                          hareeshram

                          Hi Jean, Thanks for your support on mod_cluster. Well appreciated!

                           

                          Q1) Where can I configure proxytimeout for mod cluster running in a non clustered mode?

                           

                          Just wondering on the following.

                           

                          In my case , I have nodeTimeout set to default ( it is -1) and we are able to observe the time out within just two minutes.

                          Also one more important observation that I  saw is.

                           

                          In the file, mod-cluster-jboss-beans.xml, there are possibilities for both clustered and non clustered config.

                          I am using the non clustered one means <bean name="ModClusterService" class="org.jboss.modcluster.ModClusterService" mode="On Demand">

                           

                          I see that this bean, by default is not injected with ModClusterConfig where as if see HAModClusterService, it has like below

                          ---------------------------------------------

                          <!-- The core HAModClusterService, for use in clustered environments -->

                            <bean name="HAModClusterService" class="org.jboss.modcluster.ha.HAModClusterService" mode="On Demand">

                               <annotation>@org.jboss.aop.microcontainer.aspects.jmx.JMX(name="jboss.web:service=ModCluster",exposedInterface=org.jboss.modcluster.ha.HAModClusterServiceMBean.class)</annotation>

                              <constructor>

                                <parameter><inject bean="HAPartition"/></parameter>

                                <parameter><inject bean="HAModClusterConfig"/></parameter>

                                <parameter><inject bean="DynamicLoadBalanceFactorProvider"/></parameter>

                          --------------------------------------------- but for ModClusterService, it is

                           

                          <!-- The core ModClusterService, for use in non-clustered environments -->

                            <bean name="ModClusterService" class="org.jboss.modcluster.ModClusterService" mode="On Demand">

                               <annotation>@org.jboss.aop.microcontainer.aspects.jmx.JMX(name="jboss.web:service=ModCluster",exposedInterface=org.jboss.modcluster.ModClusterServiceMBean.class)</annotation>

                              <constructor>

                                <parameter><inject bean="DynamicLoadBalanceFactorProvider"/></parameter>

                           

                           

                          ----------------------------------------------

                           

                          and I see nodeTimeout is part ModClusterConfig.

                           

                          Q2) Should we inject this to set nodeTimeOut?

                          Q3) Is it somehow related to https://issues.jboss.org/browse/MODCLUSTER-180?

                           

                          Thanks for your time.

                          • 10. Re: Worker in error state after a request timeout
                            jfclere

                            Q1) Via:

                            <property name="nodeTimeout">120</property>

                            Q2) You have to inject it in the right mbean (HAModClusterConfig or ModClusterConfig depending on your config).

                            Q3) I don't think so.