Re: Misc review notes for draft-18 p1 from Mark Nottingham on 2012-02-07 (ietf-http-wg@w3.org from January to March 2012)

From: Mark Nottingham <mnot@mnot.net>
Date: Tue, 7 Feb 2012 14:18:31 +1100
To: Willy Tarreau <w@1wt.eu>
Cc: "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, "Julian F. Reschke" <julian.reschke@gmx.de>
Message-Id: <7F83A41A-2E06-4743-969C-E27967A1ACD5@mnot.net>
On 27/01/2012, at 2:56 AM, Willy Tarreau wrote:

> Hi,
> 
> I haven't finished reading p1 but I already have some comments, so
> I'm sending them here and will proceed with what remains.

Hi Willy,

Thanks for that. I'll add a few comments below.

> 2.1. Client/Server Messaging, page 11
> 
>>  Note that 1xx responses (Section 7.1 of [Part2]) are not final;
>>  therefore, a server can send zero or more 1xx responses, followed by
>>  exactly one final response (with any other status code).
> 
> This parts falls here quite out of context in my opinion. Neither
> responses nor status core nor messaging has been defined yet and all
> of a sudden we get this. I suggest we move this to P2 7.1 and replace
> it with a small note such as :
> 
>  Note that sometimes a server may send multiple responses, see Section
>  7.1 of [Part2] for more details about interim responses.

I'll leave it to the editors to take this as input.


> 2.4. Intermediaries, page 13
> 
> Context :
>>>>>> 
>>      UA =========== A =========== B =========== C =========== O
>>                 <             <             <             <
> ...
> 
>>  For example, B might be receiving
>>  requests from many clients other than A, and/or forwarding requests
>>  to servers other than C, at the same time that it is handling A's
>>  request.
> 
> I'd underline that there is no single path between a UA and an intermediary,
> and that sometimes direct and indirect communications are possible. It helps
> remind people that rewriting URLs along the path is not always a good idea.
> I'd suggest this then :
> 
>    For example, B might be receiving requests from many clients other than A
>    including UA/C/O, and/or forwarding requests to servers other than C, at
>    the same time that it is handling A's request.

To the editors (generally, I agree).


> Later :
> 
>>  An HTTP-to-HTTP proxy is called a "transforming proxy" if it is
>>  designed or configured to modify request or response messages in a
>>  semantically meaningful way (i.e., modifications, beyond those
>>  required by normal HTTP processing, that change the message in a way
>>  that would be significant to the original sender or potentially
>>  significant to downstream recipients).
> 
> It is not totally clear to me if a compressing proxy is a transforming
> proxy, nor if one that rewrites Location headers to normalize them is
> a transforming proxy.

There's always going to be a fuzzy line here, I think.


> 2.7.1. http URI scheme
> 
>>   If the host identifier is provided as an IP literal or IPv4 address,
> 
> I did not find a clear definition of the term "IP literal". Also, does it
> cover the bracketed format of IPv6 ?
> 
> 
> 3.3. Message Body
> 
>>  The length of the message-body is determined by one of the following
>>  (in order of precedence):
>> 
>>  1.  Any response to a HEAD request and any response with a status
>>      code of 100-199, 204, or 304 is always terminated by the first
>>      empty line after the header fields, regardless of the header
>>      fields present in the message, and thus cannot contain a message-
>>      body.
> 
> Now that we've included the CONNECT method in the spec, I think it makes
> sense to define whether it has a body or not in case of success. I've
> found myself sometimes adding "Content-length: 0" as well as huge values
> in the past on some CONNECT requests to help interoperability with broken
> proxies, as well as "Connection: close" on these similar requests. Obviously
> the implementations were faulty but a faulty implementation often results
> from ambiguous specs.
> 
> Could we suggest that as a first rule, a 200 response to a CONNECT request
> implies an infinite content length (I don't like that very much since it's
> false), or that it has no message body and that the connection is immediately
> switched to a tunnel ?
> 
>   0. Any response with a status code of 200 to a CONNECT request does not
>      contain any message-body and immediately switches to a tunnel (Section
>      6.9 of [Part2]).
> 
> Also, since I've seen some implementations send "Content-length: 0" on
> CONNECT requests (which I happened to mimmick once), I'm realizing that
> it's not always obvious what to send on responses where no content is
> expected. Would it make sense to insist on the fact that it is not
> necessary to send "Content-length: 0" on messages which do not have a
> body by the rules above ?

This should be covered by <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/250>.


> 3.5. Message Parsing Robustness
> 
>>  Likewise, although the line terminator for the start-line and header
>>  fields is the sequence CRLF, we recommend that recipients recognize a
>>  single LF as a line terminator and ignore any CR.
> 
> Does this mean that CR CR CR CR CR CR LF should be interpreted as a single
> LF ? It kinds of scares me on the risk of smuggling attacks. I'd rather
> suggest :
> 
>    ... we recommend that recipients recognize a single LF as a line
>    terminator and ignore the optional preceeding CR. Messages containing
>    a CR not followed by an LF MUST be rejected.

I've created <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/340>.


>>  When a server listening only for HTTP request messages, or processing
>>  what appears from the start-line to be an HTTP request message,
>>  receives a sequence of octets that does not match the HTTP-message
> 
> Wouldn't "does not *exactly* match" be better ? I'm used to find
> crappy requests in my logs which are blocked but which some not-so-lazy
> implementations would let pass (eg: multiple SP).

I don't think that adding a word here is going to change those implementations behaviour, unfortunately. 


>>  grammar aside from the robustness exceptions listed above, the server
>>  MUST respond with an HTTP/1.1 400 (Bad Request) response.
> 
> I would also suggest that clients and proxies protect themselves against
> malformed response messages, which are problematic in shared hosting
> environments. This could be summarized like this :
> 
>    In general, any agent which receives a malformed message MUST NOT try
>    to fix it if there is any possibility that any other implementation
>    along the chain understands it differently. In such conditions, the
>    message MUST be rejected.

This will make too many implementations non-conformant.


> 4.1. Types of Request Target
> 
>> Note: The "no rewrite" rule prevents the proxy from changing the
> 
> I did not find reference to this "no rewrite" rule.
> 
> 
> 4.2. The Resource Identified by a Request
> 
>>  1.  If request-target is an absolute-URI, the host is part of the
>>      request-target.  Any Host header field value in the request MUST
>>      be ignored.
>> 
>>  2.  If the request-target is not an absolute-URI, and the request
>>      includes a Host header field, the host is determined by the Host
>>      header field value.
>> 
>>  3.  If the host as determined by rule 1 or 2 is not a valid host on
>>      the server, the response MUST be a 400 (Bad Request) error
>>      message.
> 
> Rule 3 might be difficult to apply in massively hosted environments, as
> I easily imagine that there could be a large "vhosts" directory with
> all the hosts roots presented by their names there. The server would
> then simply try to "cd $host" to check for the host's validity, which
> might seem appropriate at first. But using a host of ".." or a host
> containing a slash would have dramatic effects.
> 
> I don't know what recommendation we could add here because we can't
> add boring long sentences, but avoiding such simple traps would be
> nice. Maybe we should just add :
> 
>    For instance, a host should never be ".." nor contain a slash.

That's an implementation-specific security concern, not a general HTTP one.


> 7.4. Use of HTTP by other protocols
> 
> It would make sense to list WebSocket here too since it's the first large
> scale user of the Upgrade mechanism.
> 
> 
> 8.4. TE
> 
>>  The presence of the keyword "trailers" indicates that the client is
>>  willing to accept trailer fields in a chunked transfer-coding, as
> 
> Is it only limited to the client ? Nowhere it's said that a server cannot
> advertise "TE: trailers" in responses so that a client knows it can emit
> chunked-encoded messages with trailers in further requests (eg: backups
> with SHA1 at the end). Replace "client" with "sender" maybe ?

The current wording doesn't prohibit other uses (or users) in the future... 

That said, Julian is right, we need to clarify this.



> 8.5. Trailer
> 
>>  If no Trailer header field is present, the trailer SHOULD NOT include
>>  any header fields.  See Section 5.1.1 for restrictions on the use of
>>  trailer fields in a "chunked" transfer-coding.
> 
> in 5.1.1 p37, it's said :
> 
>>  A server using chunked transfer-coding in a response MUST NOT use the
>>  trailer for any header fields unless at least one of the following is
>>  true:
> 
> Is the SHOULD NOT vs MUST NOT difference on purpose ? It seems to indicate
> that there should be a tolerance when parsing unadvertised trailers (I'm
> fine with this, just checking whether the wording is expected).

I think so.


> A.1.2 Keep-Alive Connections
> 
>>  Clients are also encouraged to consider the use of Connection: keep-
>>  alive in requests carefully; while they can enable persistent
>>  connections with HTTP/1.0 servers, clients using them need will need
>>  to monitor the connection for "hung" requests (which indicate that
>>  the client ought stop sending the header),
> 
> I know a number of people who use the term "the header" to designate all
> the headers section. I must say that when I read this sentence, it was
> unclear to me upon first reading that the intent was in fact to stop
> sending "Connection: keep-alive" in subsequent requests, as it can also
> be understood as "stop sending the headers as long as the connection
> hangs" (which does not make sense).
> 
> I'd suggest the following change :
> 
> -   the client ought stop sending the header),
> +   the client ought stop using this header in further communications with
> +   the server),

.. or just "in future requests." 

To the editors.


> At a number of places it is suggested to "close the connection". I
> think we could add an annex such as the following one, with references
> everywhere we suggest closing the connection, as well as one pointer
> in "6.1.2.2 Pipelining" :
> 
>    A.x.x Closing a Connection
> 
>    When a server needs to close a connection, it must ensure that doing so
>    will not risk prematurely terminate any previous response. When TCP
>    segments are still in flight during a socket close, operating systems
>    generally turn the socket to orphaned state, during which lingering data
>    will still be emitted but any received data would cause an immediate
>    connection abort. The connection may also be aborted when the system
>    is getting low on orphaned sockets. This means that a close before all
>    lingering data are acknowledged by the client might result in a loss of
>    unacknowledged data. This is a very common issue when performing a
>    redirect upon a POST request before all the client's body has been read.
>    While this is not always an issue when a server wants to abort a current
>    request, it becomes a real issue when the client tries to pipeline requests,
>    because aborting the current request may also result in destroying previous
>    unacknowledged response too, possibly causing a client to retry already
>    processed requests that it believes were ignored.
> 
>    The proper way for a server to close a connection without risking issues
>    described above is the following :
> 
>       1) shutdown the transmit channel, usually using the shutdown() system
>          call.
>       2) drain any incoming data and (if possible) check for any lingering
>          data in the transmit queue.
>       3) when the receive channel reports a shutdown, or when all transmitted
>          data have been acknowledged, or when enough time has elapsed, perform
>          the close() on the socket.
> 
>    Operating systems do not always easily report the amount of lingering data
>    and will not always wake up when the queue is empty. A tradeoff has to be
>    found between keeping connections alive for too long a time and risking
>    closing too early and having some clients get truncated or empty responses.


My .02 - this seems more like implementation-specific advice; there are cases where this will not be the case. What do others think?


--
Mark Nottingham   http://www.mnot.net/
Received on Tuesday, 7 February 2012 03:24:04 UTC