- From: Willy Tarreau <w@1wt.eu>
- Date: Thu, 26 Jan 2012 16:56:37 +0100
- To: ietf-http-wg@w3.org
Hi, I haven't finished reading p1 but I already have some comments, so I'm sending them here and will proceed with what remains. 2.1. Client/Server Messaging, page 11 > Note that 1xx responses (Section 7.1 of [Part2]) are not final; > therefore, a server can send zero or more 1xx responses, followed by > exactly one final response (with any other status code). This parts falls here quite out of context in my opinion. Neither responses nor status core nor messaging has been defined yet and all of a sudden we get this. I suggest we move this to P2 7.1 and replace it with a small note such as : Note that sometimes a server may send multiple responses, see Section 7.1 of [Part2] for more details about interim responses. 2.4. Intermediaries, page 13 Context : > > > > > > UA =========== A =========== B =========== C =========== O > < < < < ... > For example, B might be receiving > requests from many clients other than A, and/or forwarding requests > to servers other than C, at the same time that it is handling A's > request. I'd underline that there is no single path between a UA and an intermediary, and that sometimes direct and indirect communications are possible. It helps remind people that rewriting URLs along the path is not always a good idea. I'd suggest this then : For example, B might be receiving requests from many clients other than A including UA/C/O, and/or forwarding requests to servers other than C, at the same time that it is handling A's request. Later : > An HTTP-to-HTTP proxy is called a "transforming proxy" if it is > designed or configured to modify request or response messages in a > semantically meaningful way (i.e., modifications, beyond those > required by normal HTTP processing, that change the message in a way > that would be significant to the original sender or potentially > significant to downstream recipients). It is not totally clear to me if a compressing proxy is a transforming proxy, nor if one that rewrites Location headers to normalize them is a transforming proxy. 2.7.1. http URI scheme > If the host identifier is provided as an IP literal or IPv4 address, I did not find a clear definition of the term "IP literal". Also, does it cover the bracketed format of IPv6 ? 3.3. Message Body > The length of the message-body is determined by one of the following > (in order of precedence): > > 1. Any response to a HEAD request and any response with a status > code of 100-199, 204, or 304 is always terminated by the first > empty line after the header fields, regardless of the header > fields present in the message, and thus cannot contain a message- > body. Now that we've included the CONNECT method in the spec, I think it makes sense to define whether it has a body or not in case of success. I've found myself sometimes adding "Content-length: 0" as well as huge values in the past on some CONNECT requests to help interoperability with broken proxies, as well as "Connection: close" on these similar requests. Obviously the implementations were faulty but a faulty implementation often results from ambiguous specs. Could we suggest that as a first rule, a 200 response to a CONNECT request implies an infinite content length (I don't like that very much since it's false), or that it has no message body and that the connection is immediately switched to a tunnel ? 0. Any response with a status code of 200 to a CONNECT request does not contain any message-body and immediately switches to a tunnel (Section 6.9 of [Part2]). Also, since I've seen some implementations send "Content-length: 0" on CONNECT requests (which I happened to mimmick once), I'm realizing that it's not always obvious what to send on responses where no content is expected. Would it make sense to insist on the fact that it is not necessary to send "Content-length: 0" on messages which do not have a body by the rules above ? 3.5. Message Parsing Robustness > Likewise, although the line terminator for the start-line and header > fields is the sequence CRLF, we recommend that recipients recognize a > single LF as a line terminator and ignore any CR. Does this mean that CR CR CR CR CR CR LF should be interpreted as a single LF ? It kinds of scares me on the risk of smuggling attacks. I'd rather suggest : ... we recommend that recipients recognize a single LF as a line terminator and ignore the optional preceeding CR. Messages containing a CR not followed by an LF MUST be rejected. > When a server listening only for HTTP request messages, or processing > what appears from the start-line to be an HTTP request message, > receives a sequence of octets that does not match the HTTP-message Wouldn't "does not *exactly* match" be better ? I'm used to find crappy requests in my logs which are blocked but which some not-so-lazy implementations would let pass (eg: multiple SP). > grammar aside from the robustness exceptions listed above, the server > MUST respond with an HTTP/1.1 400 (Bad Request) response. I would also suggest that clients and proxies protect themselves against malformed response messages, which are problematic in shared hosting environments. This could be summarized like this : In general, any agent which receives a malformed message MUST NOT try to fix it if there is any possibility that any other implementation along the chain understands it differently. In such conditions, the message MUST be rejected. 4.1. Types of Request Target > Note: The "no rewrite" rule prevents the proxy from changing the I did not find reference to this "no rewrite" rule. 4.2. The Resource Identified by a Request > 1. If request-target is an absolute-URI, the host is part of the > request-target. Any Host header field value in the request MUST > be ignored. > > 2. If the request-target is not an absolute-URI, and the request > includes a Host header field, the host is determined by the Host > header field value. > > 3. If the host as determined by rule 1 or 2 is not a valid host on > the server, the response MUST be a 400 (Bad Request) error > message. Rule 3 might be difficult to apply in massively hosted environments, as I easily imagine that there could be a large "vhosts" directory with all the hosts roots presented by their names there. The server would then simply try to "cd $host" to check for the host's validity, which might seem appropriate at first. But using a host of ".." or a host containing a slash would have dramatic effects. I don't know what recommendation we could add here because we can't add boring long sentences, but avoiding such simple traps would be nice. Maybe we should just add : For instance, a host should never be ".." nor contain a slash. 7.4. Use of HTTP by other protocols It would make sense to list WebSocket here too since it's the first large scale user of the Upgrade mechanism. 8.4. TE > The presence of the keyword "trailers" indicates that the client is > willing to accept trailer fields in a chunked transfer-coding, as Is it only limited to the client ? Nowhere it's said that a server cannot advertise "TE: trailers" in responses so that a client knows it can emit chunked-encoded messages with trailers in further requests (eg: backups with SHA1 at the end). Replace "client" with "sender" maybe ? 8.5. Trailer > If no Trailer header field is present, the trailer SHOULD NOT include > any header fields. See Section 5.1.1 for restrictions on the use of > trailer fields in a "chunked" transfer-coding. in 5.1.1 p37, it's said : > A server using chunked transfer-coding in a response MUST NOT use the > trailer for any header fields unless at least one of the following is > true: Is the SHOULD NOT vs MUST NOT difference on purpose ? It seems to indicate that there should be a tolerance when parsing unadvertised trailers (I'm fine with this, just checking whether the wording is expected). A.1.2 Keep-Alive Connections > Clients are also encouraged to consider the use of Connection: keep- > alive in requests carefully; while they can enable persistent > connections with HTTP/1.0 servers, clients using them need will need > to monitor the connection for "hung" requests (which indicate that > the client ought stop sending the header), I know a number of people who use the term "the header" to designate all the headers section. I must say that when I read this sentence, it was unclear to me upon first reading that the intent was in fact to stop sending "Connection: keep-alive" in subsequent requests, as it can also be understood as "stop sending the headers as long as the connection hangs" (which does not make sense). I'd suggest the following change : - the client ought stop sending the header), + the client ought stop using this header in further communications with + the server), At a number of places it is suggested to "close the connection". I think we could add an annex such as the following one, with references everywhere we suggest closing the connection, as well as one pointer in "6.1.2.2 Pipelining" : A.x.x Closing a Connection When a server needs to close a connection, it must ensure that doing so will not risk prematurely terminate any previous response. When TCP segments are still in flight during a socket close, operating systems generally turn the socket to orphaned state, during which lingering data will still be emitted but any received data would cause an immediate connection abort. The connection may also be aborted when the system is getting low on orphaned sockets. This means that a close before all lingering data are acknowledged by the client might result in a loss of unacknowledged data. This is a very common issue when performing a redirect upon a POST request before all the client's body has been read. While this is not always an issue when a server wants to abort a current request, it becomes a real issue when the client tries to pipeline requests, because aborting the current request may also result in destroying previous unacknowledged response too, possibly causing a client to retry already processed requests that it believes were ignored. The proper way for a server to close a connection without risking issues described above is the following : 1) shutdown the transmit channel, usually using the shutdown() system call. 2) drain any incoming data and (if possible) check for any lingering data in the transmit queue. 3) when the receive channel reports a shutdown, or when all transmitted data have been acknowledged, or when enough time has elapsed, perform the close() on the socket. Operating systems do not always easily report the amount of lingering data and will not always wake up when the queue is empty. A tradeoff has to be found between keeping connections alive for too long a time and risking closing too early and having some clients get truncated or empty responses. That's all for me now, I'll probably have other comments later. Regards, Willy
Received on Thursday, 26 January 2012 15:57:14 UTC