Re: Misc review notes for draft-18 p1 from Amos Jeffries on 2012-02-07 (ietf-http-wg@w3.org from January to March 2012)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Tue, 07 Feb 2012 17:58:58 +1300
To: ietf-http-wg@w3.org
Message-ID: <4F30AF92.4000700@treenet.co.nz>
On 7/02/2012 4:18 p.m., Mark Nottingham wrote:
> On 27/01/2012, at 2:56 AM, Willy Tarreau wrote:
>
>> Hi,
>>
>> I haven't finished reading p1 but I already have some comments, so
>> I'm sending them here and will proceed with what remains.
> Hi Willy,
>
> Thanks for that. I'll add a few comments below.
>
>> 2.1. Client/Server Messaging, page 11
>>
>>>   Note that 1xx responses (Section 7.1 of [Part2]) are not final;
>>>   therefore, a server can send zero or more 1xx responses, followed by
>>>   exactly one final response (with any other status code).
>> This parts falls here quite out of context in my opinion. Neither
>> responses nor status core nor messaging has been defined yet and all
>> of a sudden we get this. I suggest we move this to P2 7.1 and replace
>> it with a small note such as :
>>
>>   Note that sometimes a server may send multiple responses, see Section
>>   7.1 of [Part2] for more details about interim responses.
> I'll leave it to the editors to take this as input.
>
>
>> 2.4. Intermediaries, page 13
>>
>> Context :
>>>       UA =========== A =========== B =========== C =========== O
>>>                  <              <              <              <
>> ...
>>
>>>   For example, B might be receiving
>>>   requests from many clients other than A, and/or forwarding requests
>>>   to servers other than C, at the same time that it is handling A's
>>>   request.
>> I'd underline that there is no single path between a UA and an intermediary,
>> and that sometimes direct and indirect communications are possible. It helps
>> remind people that rewriting URLs along the path is not always a good idea.
>> I'd suggest this then :
>>
>>     For example, B might be receiving requests from many clients other than A
>>     including UA/C/O, and/or forwarding requests to servers other than C, at
>>     the same time that it is handling A's request.
> To the editors (generally, I agree).
>
>
>> Later :
>>
>>>   An HTTP-to-HTTP proxy is called a "transforming proxy" if it is
>>>   designed or configured to modify request or response messages in a
>>>   semantically meaningful way (i.e., modifications, beyond those
>>>   required by normal HTTP processing, that change the message in a way
>>>   that would be significant to the original sender or potentially
>>>   significant to downstream recipients).
>> It is not totally clear to me if a compressing proxy is a transforming
>> proxy, nor if one that rewrites Location headers to normalize them is
>> a transforming proxy.
> There's always going to be a fuzzy line here, I think.

The emphasis on *semantic* changes is clear. The fuzziness only happens 
when HTTP intracacies make you loose sight of what is semantic vs 
transport or syntax details.

I'm inclined to consider it in the form of whether the change is limited 
in its effects.
  * if the client were to base its next request on this response and go 
straight to the origin would it matter?
  * if the client were also passing this request via another path would 
it matter?

So normalizing and fixing headers to conform to spec is non-transforming 
(usually). The upstream is required to to have produced such in the 
first place.

Compressing the body changes the representation with effects cascading 
down into ETag. Conditional and range requests to the origin will not 
succeed as intended. So is transforming.

Transfer-encoding compressions should not be altering the end-to-end 
representation so should be considered non-transforming.

>> 2.7.1. http URI scheme
>>
>>>    If the host identifier is provided as an IP literal or IPv4 address,
>> I did not find a clear definition of the term "IP literal". Also, does it
>> cover the bracketed format of IPv6 ?
>>
>>
>> 3.3. Message Body
>>
>>>   The length of the message-body is determined by one of the following
>>>   (in order of precedence):
>>>
>>>   1.  Any response to a HEAD request and any response with a status
>>>       code of 100-199, 204, or 304 is always terminated by the first
>>>       empty line after the header fields, regardless of the header
>>>       fields present in the message, and thus cannot contain a message-
>>>       body.
>> Now that we've included the CONNECT method in the spec, I think it makes
>> sense to define whether it has a body or not in case of success. I've
>> found myself sometimes adding "Content-length: 0" as well as huge values
>> in the past on some CONNECT requests to help interoperability with broken
>> proxies, as well as "Connection: close" on these similar requests. Obviously
>> the implementations were faulty but a faulty implementation often results
>> from ambiguous specs.
>>
>> Could we suggest that as a first rule, a 200 response to a CONNECT request
>> implies an infinite content length (I don't like that very much since it's
>> false), or that it has no message body and that the connection is immediately
>> switched to a tunnel ?
>>
>>    0. Any response with a status code of 200 to a CONNECT request does not
>>       contain any message-body and immediately switches to a tunnel (Section
>>       6.9 of [Part2]).
>>
>> Also, since I've seen some implementations send "Content-length: 0" on
>> CONNECT requests (which I happened to mimmick once), I'm realizing that
>> it's not always obvious what to send on responses where no content is
>> expected. Would it make sense to insist on the fact that it is not
>> necessary to send "Content-length: 0" on messages which do not have a
>> body by the rules above ?
> This should be covered by<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/250>.

FWIW: +1 on Dans wording. My special use-case fits in very cleanly with 
that Content-Length/Transfer-Encoding exception.

>
>
>> 3.5. Message Parsing Robustness
>>
>>>   Likewise, although the line terminator for the start-line and header
>>>   fields is the sequence CRLF, we recommend that recipients recognize a
>>>   single LF as a line terminator and ignore any CR.
>> Does this mean that CR CR CR CR CR CR LF should be interpreted as a single
>> LF ? It kinds of scares me on the risk of smuggling attacks. I'd rather
>> suggest :
>>
>>     ... we recommend that recipients recognize a single LF as a line
>>     terminator and ignore the optional preceeding CR. Messages containing
>>     a CR not followed by an LF MUST be rejected.
> I've created<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/340>.
>

>> At a number of places it is suggested to "close the connection". I
>> think we could add an annex such as the following one, with references
>> everywhere we suggest closing the connection, as well as one pointer
>> in "6.1.2.2 Pipelining" :
>>
>>     A.x.x Closing a Connection
>>
>>     When a server needs to close a connection, it must ensure that doing so
>>     will not risk prematurely terminate any previous response. When TCP
>>     segments are still in flight during a socket close, operating systems
>>     generally turn the socket to orphaned state, during which lingering data
>>     will still be emitted but any received data would cause an immediate
>>     connection abort. The connection may also be aborted when the system
>>     is getting low on orphaned sockets. This means that a close before all
>>     lingering data are acknowledged by the client might result in a loss of
>>     unacknowledged data. This is a very common issue when performing a
>>     redirect upon a POST request before all the client's body has been read.
>>     While this is not always an issue when a server wants to abort a current
>>     request, it becomes a real issue when the client tries to pipeline requests,
>>     because aborting the current request may also result in destroying previous
>>     unacknowledged response too, possibly causing a client to retry already
>>     processed requests that it believes were ignored.
>>
>>     The proper way for a server to close a connection without risking issues
>>     described above is the following :
>>
>>        1) shutdown the transmit channel, usually using the shutdown() system
>>           call.
>>        2) drain any incoming data and (if possible) check for any lingering
>>           data in the transmit queue.
>>        3) when the receive channel reports a shutdown, or when all transmitted
>>           data have been acknowledged, or when enough time has elapsed, perform
>>           the close() on the socket.
>>
>>     Operating systems do not always easily report the amount of lingering data
>>     and will not always wake up when the queue is empty. A tradeoff has to be
>>     found between keeping connections alive for too long a time and risking
>>     closing too early and having some clients get truncated or empty responses.
>
> My .02 - this seems more like implementation-specific advice; there are cases where this will not be the case. What do others think?

Methinks this would fall under editorial changes for the proposal to 
shift the transport details and in particular the TCP-specific details 
into a clearly defined area somewhere. Pipelining, keep-alive and 
closing connections are all transport concepts specific to TCP or 
similar with stream/sequence properties.

AYJ
Received on Tuesday, 7 February 2012 05:02:14 UTC