- From: 陈智昌 <willchan@chromium.org>
- Date: Wed, 4 Jun 2014 12:50:59 -0700
- To: HTTP Working Group <ietf-http-wg@w3.org>, Willy Tarreau <w@1wt.eu>, Matt Menke <mmenke@chromium.org>
- Message-ID: <CAA4WUYhy-c48qFiUQFx57B8xi39k=Z+Z-eS6_L90v0zdPndfHQ@mail.gmail.com>
Thanks everyone for chiming in. At this point, I think I should make some clarifying points and inject / re-emphasize some personal opinion here. You can probably ignore this email unless you're interested in all the details. Clarifications: 1) There is fundamentally a race between the client sending a request and the server closing its end of the connection, due to the network latency. 2) If the client does not read from the socket before/during sending a request, then the time gap for the "race" may include not only the network latency, but also the time until the client transmits the request and starts reading from the socket. 3) When the client receives FIN or a socket error when reading from the socket, it usually cannot tell the root cause (server timing out the connection, server application crash, etc). 4) Clients have different heuristics for retrying on such errors. Here's Chromium's ( https://code.google.com/p/chromium/codesearch#chromium/src/net/http/http_network_transaction.cc&q=http_network_transaction.cc&sq=package:chromium&l=1479 ): bool HttpNetworkTransaction::ShouldResendRequest() const { bool connection_is_proven = stream_->IsConnectionReused(); bool has_received_headers = GetResponseHeaders() != NULL; // NOTE: we resend a request only if we reused a keep-alive connection. // This automatically prevents an infinite resend loop because we'll run // out of the cached keep-alive connections eventually. if (connection_is_proven && !has_received_headers) return true; return false;} As one can see, we opt not to retry on an unused connection (HttpStream::IsConnectionReused() is actually misnamed, it also includes sockets that were ever idle, like our preconnected sockets). This is because the HTTP request may have caused a server error (like the server application to crash) and it would not be nice to re-send that request. However, for idle HTTP connections, there's a reasonable likelihood due to the idleness that we hit one of those races, so we will retry the transaction on a new connection. I guess I should show the situation that leads us to call this function...which is on relevant socket errors ( https://code.google.com/p/chromium/codesearch#chromium/src/net/http/http_network_transaction.cc&q=http_network_transaction.cc&sq=package:chromium&l=1411 ): int HttpNetworkTransaction::HandleIOError(int error) { // Because the peer may request renegotiation with client authentication at // any time, check and handle client authentication errors. HandleClientAuthError(error); switch (error) { // If we try to reuse a connection that the server is in the process of // closing, we may end up successfully writing out our request (or a // portion of our request) only to find a connection error when we try to // read from (or finish writing to) the socket. case ERR_CONNECTION_RESET: case ERR_CONNECTION_CLOSED: case ERR_CONNECTION_ABORTED: // There can be a race between the socket pool checking checking whether a // socket is still connected, receiving the FIN, and sending/reading data // on a reused socket. If we receive the FIN between the connectedness // check and writing/reading from the socket, we may first learn the socket // is disconnected when we get a ERR_SOCKET_NOT_CONNECTED. This will most // likely happen when trying to retrieve its IP address. // See http://crbug.com/105824 for more details. case ERR_SOCKET_NOT_CONNECTED: // If a socket is closed on its initial request, HttpStreamParser returns // ERR_EMPTY_RESPONSE. This may still be close/reuse race if the socket was // preconnected but failed to be used before the server timed it out. case ERR_EMPTY_RESPONSE: if (ShouldResendRequest()) { net_log_.AddEventWithNetErrorCode( NetLog::TYPE_HTTP_TRANSACTION_RESTART_AFTER_ERROR, error); ResetConnectionAndRequestForResend(); error = OK; } break; 5) HTTP is typically request/response, so Chromium makes the assumption that it is a server bug if there is data in the socket before we try to use it, and will discard the socket and try to find another to use. However, recently we discovered that for our preconnected sockets and SPDY / HTTP/2, the server always wrote the SETTINGS frame, therefore we weren't leveraging preconnect properly for SPDY / HTTP/2. So we *recently* decided to allow sockets with readable data if the socket hadn't been used yet ( https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.cc&q=client_socket_pool_base.cc&sq=package:chromium&l=651 ): bool ClientSocketPoolBaseHelper <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=156&gsn=ClientSocketPoolBaseHelper>::IdleSocket <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=339&gsn=IdleSocket>::IsUsable <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.cc&ct=xref_usages&gs=cpp:net::internal::class-ClientSocketPoolBaseHelper::class-IdleSocket::IsUsable()-const@chromium/../../net/socket/client_socket_pool_base.cc%257Cdef&l=651&gsn=IsUsable>() const { if (socket <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=357&gsn=socket>->WasEverUsed <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/stream_socket.h&cl=GROK&ct=xref_jump_to_def&l=76&gsn=WasEverUsed>()) return socket <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=357&gsn=socket>->IsConnectedAndIdle <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/stream_socket.h&cl=GROK&ct=xref_jump_to_def&l=54&gsn=IsConnectedAndIdle>(); return socket <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/client_socket_pool_base.h&cl=GROK&ct=xref_jump_to_def&l=357&gsn=socket>->IsConnected <https://code.google.com/p/chromium/codesearch#chromium/src/net/socket/stream_socket.h&cl=GROK&ct=xref_jump_to_def&l=49&gsn=IsConnected>();} Previously, we always checked IsConnectedAndIdle(), but now we just check IsConnected() for unused sockets. One could imagine tightening this check so that we only allowed readable data if the socket was detected to be a SPDY / HTTP/2 connection. 6) Armed with the previous knowledge, we can see what happens when Chromium did *not* support 408s, and when 408 responses are sent by the server *before* any request is sent by the client. With the recent change to allow using unused sockets that have readable data, we passed on these sockets (with 408 response data) to the HTTP transaction code. Since Chromium did not support 408s, this led to 408s being treated as transaction errors, not socket IO errors, so we did not retry these transactions. Due to the recent change as described in (5), this meant that users saw more errors when browsing websites with servers that sent these 408 responses on idle connection timeouts. Chromium (still unreleased in Google Chrome release channels) is supporting 408s now ( https://src.chromium.org/viewvc/chrome?revision=274760&view=revision). Personal opinion: 1) Sending unprompted 408 responses when gracefully terminating a connection clearly does provide a signal for clients to explicitly know it's safe to retry. Explicit knowledge of retry safety is a good thing. 2) Sending unprompted HTTP responses is...very weird. This is non-obvious to most HTTP implementers. Judging from the previous responses so far on this thread, I think many people agree with this analysis. That said, I personally think (1) is more important than (2) here. Yes, (2) is ugly. But (1) makes life better for users, so deal with it? In my opinion, the spec should explicitly acknowledge this case for 408s. I don't really care to argue what the original intent of 408 was, I care about coming to a consensus on how we should treat it and making it clear in the spec. Cheers. On Mon, Jun 2, 2014 at 5:16 PM, William Chan (陈智昌) <willchan@chromium.org> wrote: > Hello folks, > > Here's the relevant text from p2 ( > http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26#section-6.5.7 > ): > > > 6.5.7 <http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26#section-6.5.7>. 408 Request Timeout > > The 408 (Request Timeout) status code indicates that the server did > not receive a complete request message within the time that it was > prepared to wait. A server SHOULD send the close connection option > (Section 6.1 of [Part1 <http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26#ref-Part1>]) in the response, since 408 implies that the > server has decided to close the connection rather than continue > waiting. If the client has an outstanding request in transit, the > client MAY repeat that request on a new connection. > > > Server operators using HAProxy have filed a bug report with Chromium due > to our lack of support for 408 ( > https://code.google.com/p/chromium/issues/detail?id=377581). We're going > to fix that. But HAProxy uses 408s in an intriguing way that violates some > of our architectural assumptions. Namely, HAProxy wants to shut down an > idle connection, it sends a 408 irrespective of whether or not there was a > HTTP request. > > Currently, Chromium's connection management code treats this as a server > bug, since when we bind a HTTP transaction to an idle connection, we expect > that the connection does not have any buffered read data (since the > previous transaction, if any, should have completed successfully). However, > there's a race condition here, since the server may have terminated that > idle connection with a 408 HTTP response followed by a FIN. > > There's fundamentally a race here, which is crappy and we fix in HTTP/2 > with GOAWAY on the connection. My question here is, is it kosher for > HAProxy to be doing this? In retrospect, I feel like it's a pragmatic > decision for them to do so. That said, it feels incredibly ugly for an > HTTP/1.X server to generate responses without a request. It's not > completely clear to me what the intent is in the spec here, because it says > that the 408 means the server did not receive a complete request. If it > only receives a partial request, then that makes sense it could send a > timeout. But if there's no request whatsoever, it's not immediately obvious > to me that the spec says that that is allowed. > > I'm curious if any other implementers have thoughts here. Would love to > hear them. >
Received on Wednesday, 4 June 2014 19:51:28 UTC