Re: Question on HTTP 408 from 陈智昌 on 2014-06-04 (ietf-http-wg@w3.org from April to June 2014)

From: 陈智昌 <willchan@chromium.org>
Date: Wed, 4 Jun 2014 12:50:59 -0700
To: HTTP Working Group <ietf-http-wg@w3.org>, Willy Tarreau <w@1wt.eu>, Matt Menke <mmenke@chromium.org>
Message-ID: <CAA4WUYhy-c48qFiUQFx57B8xi39k=Z+Z-eS6_L90v0zdPndfHQ@mail.gmail.com>
Thanks everyone for chiming in. At this point, I think I should make some
clarifying points and inject / re-emphasize some personal opinion here. You
can probably ignore this email unless you're interested in all the details.

Clarifications:
1) There is fundamentally a race between the client sending a request and
the server closing its end of the connection, due to the network latency.
2) If the client does not read from the socket before/during sending a
request, then the time gap for the "race" may include not only the network
latency, but also the time until the client transmits the request and
starts reading from the socket.
3) When the client receives FIN or a socket error when reading from the
socket, it usually cannot tell the root cause (server timing out the
connection, server application crash, etc).
4) Clients have different heuristics for retrying on such errors. Here's
Chromium's (
https://code.google.com/p/chromium/codesearch#chromium/src/net/http/http_network_transaction.cc&q=http_network_transaction.cc&sq=package:chromium&l=1479
):

bool HttpNetworkTransaction::ShouldResendRequest() const {  bool
connection_is_proven = stream_->IsConnectionReused();  bool
has_received_headers = GetResponseHeaders() != NULL;  // NOTE: we
resend a request only if we reused a keep-alive connection.  // This
automatically prevents an infinite resend loop because we'll run  //
out of the cached keep-alive connections eventually.  if
(connection_is_proven && !has_received_headers)    return true;
return false;}


As one can see, we opt not to retry on an unused connection
(HttpStream::IsConnectionReused() is actually misnamed, it also includes
sockets that were ever idle, like our preconnected sockets). This is
because the HTTP request may have caused a server error (like the server
application to crash) and it would not be nice to re-send that request.
However, for idle HTTP connections, there's a reasonable likelihood due to
the idleness that we hit one of those races, so we will retry the
transaction on a new connection. I guess I should show the situation that
leads us to call this function...which is on relevant socket errors (
https://code.google.com/p/chromium/codesearch#chromium/src/net/http/http_network_transaction.cc&q=http_network_transaction.cc&sq=package:chromium&l=1411
):

int HttpNetworkTransaction::HandleIOError(int error) {  // Because the
peer may request renegotiation with client authentication at  // any
time, check and handle client authentication errors.
HandleClientAuthError(error);  switch (error) {    // If we try to
reuse a connection that the server is in the process of    // closing,
we may end up successfully writing out our request (or a    // portion
of our request) only to find a connection error when we try to    //
read from (or finish writing to) the socket.    case
ERR_CONNECTION_RESET:    case ERR_CONNECTION_CLOSED:    case
ERR_CONNECTION_ABORTED:    // There can be a race between the socket
pool checking checking whether a    // socket is still connected,
receiving the FIN, and sending/reading data    // on a reused socket.
If we receive the FIN between the connectedness    // check and
writing/reading from the socket, we may first learn the socket    //
is disconnected when we get a ERR_SOCKET_NOT_CONNECTED.  This will
most    // likely happen when trying to retrieve its IP address.    //
See http://crbug.com/105824 for more details.    case
ERR_SOCKET_NOT_CONNECTED:    // If a socket is closed on its initial
request, HttpStreamParser returns    // ERR_EMPTY_RESPONSE. This may
still be close/reuse race if the socket was    // preconnected but
failed to be used before the server timed it out.    case
ERR_EMPTY_RESPONSE:      if (ShouldResendRequest()) {
net_log_.AddEventWithNetErrorCode(
NetLog::TYPE_HTTP_TRANSACTION_RESTART_AFTER_ERROR, error);
ResetConnectionAndRequestForResend();        error = OK;      }
break;


5) HTTP is typically request/response, so Chromium makes the assumption
that it is a server bug if there is data in the socket before we try to use
it, and will discard the socket and try to find another to use. However,
recently we discovered that for our preconnected sockets and SPDY / HTTP/2,
the server always wrote the SETTINGS frame, therefore we weren't leveraging
preconnect properly for SPDY / HTTP/2. So we *recently* decided to allow
sockets with readable data if the socket hadn't been used yet (
https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.cc&q=client_socket_pool_base.cc&sq=package:chromium&l=651
):

bool ClientSocketPoolBaseHelper
<https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=156&gsn=ClientSocketPoolBaseHelper>::IdleSocket
<https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=339&gsn=IdleSocket>::IsUsable
<https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.cc&ct=xref_usages&gs=cpp:net::internal::class-ClientSocketPoolBaseHelper::class-IdleSocket::IsUsable()-const@chromium/../../net/socket/client_socket_pool_base.cc%257Cdef&l=651&gsn=IsUsable>()
const {  if (socket
<https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=357&gsn=socket>->WasEverUsed
<https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/stream_socket.h&cl=GROK&ct=xref_jump_to_def&l=76&gsn=WasEverUsed>())
   return socket
<https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=357&gsn=socket>->IsConnectedAndIdle
<https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/stream_socket.h&cl=GROK&ct=xref_jump_to_def&l=54&gsn=IsConnectedAndIdle>();
 return socket <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=357&gsn=socket>->IsConnected
<https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/stream_socket.h&cl=GROK&ct=xref_jump_to_def&l=49&gsn=IsConnected>();}


Previously, we always checked IsConnectedAndIdle(), but now we just check
IsConnected() for unused sockets. One could imagine tightening this check
so that we only allowed readable data if the socket was detected to be a
SPDY / HTTP/2 connection.

6) Armed with the previous knowledge, we can see what happens when Chromium
did *not* support 408s, and when 408 responses are sent by the server
*before* any request is sent by the client. With the recent change to allow
using unused sockets that have readable data, we passed on these sockets
(with 408 response data) to the HTTP transaction code. Since Chromium did
not support 408s, this led to 408s being treated as transaction errors, not
socket IO errors, so we did not retry these transactions. Due to the recent
change as described in (5), this meant that users saw more errors when
browsing websites with servers that sent these 408 responses on idle
connection timeouts. Chromium (still unreleased in Google Chrome release
channels) is supporting 408s now (
https://src.chromium.org/viewvc/chrome?revision=274760&view=revision).

Personal opinion:
1) Sending unprompted 408 responses when gracefully terminating a
connection clearly does provide a signal for clients to explicitly know
it's safe to retry. Explicit knowledge of retry safety is a good thing.
2) Sending unprompted HTTP responses is...very weird. This is non-obvious
to most HTTP implementers. Judging from the previous responses so far on
this thread, I think many people agree with this analysis.

That said, I personally think (1) is more important than (2) here. Yes, (2)
is ugly. But (1) makes life better for users, so deal with it? In my
opinion, the spec should explicitly acknowledge this case for 408s. I don't
really care to argue what the original intent of 408 was, I care about
coming to a consensus on how we should treat it and making it clear in the
spec.

Cheers.


On Mon, Jun 2, 2014 at 5:16 PM, William Chan (陈智昌) <willchan@chromium.org>
wrote:

> Hello folks,
>
> Here's the relevant text from p2 (
> http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26#section-6.5.7
> ):
>
>
> 6.5.7 <http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26#section-6.5.7>.  408 Request Timeout
>
>    The 408 (Request Timeout) status code indicates that the server did
>    not receive a complete request message within the time that it was
>    prepared to wait.  A server SHOULD send the close connection option
>    (Section 6.1 of [Part1 <http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26#ref-Part1>]) in the response, since 408 implies that the
>    server has decided to close the connection rather than continue
>    waiting.  If the client has an outstanding request in transit, the
>    client MAY repeat that request on a new connection.
>
>
> Server operators using HAProxy have filed a bug report with Chromium due
> to our lack of support for 408 (
> https://code.google.com/p/chromium/issues/detail?id=377581). We're going
> to fix that. But HAProxy uses 408s in an intriguing way that violates some
> of our architectural assumptions. Namely, HAProxy wants to shut down an
> idle connection, it sends a 408 irrespective of whether or not there was a
> HTTP request.
>
> Currently, Chromium's connection management code treats this as a server
> bug, since when we bind a HTTP transaction to an idle connection, we expect
> that the connection does not have any buffered read data (since the
> previous transaction, if any, should have completed successfully). However,
> there's a race condition here, since the server may have terminated that
> idle connection with a 408 HTTP response followed by a FIN.
>
> There's fundamentally a race here, which is crappy and we fix in HTTP/2
> with GOAWAY on the connection. My question here is, is it kosher for
> HAProxy to be doing this? In retrospect, I feel like it's a pragmatic
> decision for them to do so. That said, it feels incredibly ugly for an
> HTTP/1.X server to generate responses without a request. It's not
> completely clear to me what the intent is in the spec here, because it says
> that the 408 means the server did not receive a complete request. If it
> only receives a partial request, then that makes sense it could send a
> timeout. But if there's no request whatsoever, it's not immediately obvious
> to me that the spec says that that is allowed.
>
> I'm curious if any other implementers have thoughts here. Would love to
> hear them.
>
Received on Wednesday, 4 June 2014 19:51:28 UTC