Re: WGLC: p1 MUSTs from Willy Tarreau on 2013-04-30 (ietf-http-wg@w3.org from April to June 2013)

From: Willy Tarreau <w@1wt.eu>
Date: Tue, 30 Apr 2013 21:40:16 +0200
To: Alex Rousskov <rousskov@measurement-factory.com>
Cc: IETF HTTP WG <ietf-http-wg@w3.org>
Message-ID: <20130430194016.GM22605@1wt.eu>
Hello Alex,

That was quite a long mail, I think it's more efficient next time to split
this into multiple parts to help people respond to some parts only. I've
read it all, and for a few of them I had no opinion but I gave mine when
possible.

On Tue, Apr 30, 2013 at 12:54:54PM -0600, Alex Rousskov wrote:
> Hello,
> 
>     Summary: The specs have improved considerably since 2012. Thank you
> for not giving up on them despite HTTP/2.0 excitement!
> 
> Due to the lack of time, I have to focus on MUST-level requirements
> only. These comments are based on the "latest" snapshot dated Mon 29 Apr
> 2013 03:13:05 PM MDT at
> https://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/p1-messaging.html
> 
> I hope these comments can be addressed by editors alone, but I apologize
> in advance if some are found too controversial and should have been sent
> separately.
> 
> 
> > A sender MUST NOT generate protocol elements that do not match the
> > grammar defined by the ABNF rules for those protocol elements that
> > are applicable to the sender's role.
> 
> The "for those protocol elements..." part should be dropped IMO. A
> sender MUST NOT generate invalid protocol elements even if they are not
> applicable to the sender's role. Note that we are talking about
> _generation_ and not forwarding here.

The notion of "forwarding" is something very recent here. Since 2616, we
only had receivers and senders. And even here as you can see we're clearly
talking about a sender. At most places, a sender is also a forwarder, which
can cause confusion. The fact that you're specifically talking about
forwarding here means that the text is clear on the subject, so probably
we should leave it in order to avoid confusion.

> > If a received protocol element is processed, the recipient MUST be
> > able to parse any value that would match the ABNF rules
> 
> "processed" seems too broad because simply buffering a header may be
> called "processing". "Interpreted" may be better. Or did I miss the
> definition of "process" that clarifies this?

I think you got it right. Or maybe "used" would fit better for you ?

> > If a received protocol element is processed, the recipient MUST be
> > able to parse any value that would match the ABNF rules for that
> > protocol element, excluding only those rules not applicable to the
> > recipient's role.
> 
> The "excluding only those rules not applicable..." part seems to
> contradict the "processed" verb. Why would a recipient want to process
> something inapplicable? Perhaps this is related to the "process" versus
> "interpreted" issue mentioned above.

Interesting. I think "processed" covers "forwarded" here. Typically
a gateway would not necessarily know how to validate certain header
field values that it does not uses but still has to forward.

> > the recipient MUST be
> > able to parse any value that would match the ABNF rules for that
> > protocol element, excluding only those rules not applicable to the
> > recipient's role.
> 
> Please rephrase to avoid double negation in "excluding not applicable".
> For example: "the recipient MUST be able to parse any value matching the
> corresponding ABNF protocol element rules applicable to the recipient's
> role"

I agree this looks better.

(...)
> > A server MUST be prepared to receive URIs of unbounded length
> 
> This MUST may be demoted to "ought" because "be prepared" is too vague
> (but see below for a related missing MUST).

Be prepared means that it's their responsibility to handle this correctly,
including rejecting the request. However they "must be prepared" to face
URIs too large for them. Some servers used to crash when dealing with too
large URIs you know...

> > A server MUST be prepared to receive URIs of unbounded length and
> > respond with the 414
> 
> Please insert a second MUST after "and": "and MUST respond".

Indeed.

> > Multiple header fields with the same field name can be combined into
> > one "field-name: field-value" pair
> 
> Should this be a MAY as in "The recipient MAY combine multiple ...". As
> worded now, it is not clear whether a proxy is allowed to combine
> headers when forwarding them. Note that this affects extension and other
> headers that a proxy may not understand (but may still want to combine
> if allowed to do so).

Indeed we should have a MAY here in my opinion to improve clarity.

> > A server MUST be prepared to receive request header fields of
> > unbounded length and respond
> 
> Consider removing the above MUST but please add MUST after "and": A
> server ought to be prepared to receive ... and MUST respond ...
> See above for discussion of a similar MUST that applies to URIs of
> unbounded length.

Same as above for me and for same reasons.

> > A client MUST be prepared to receive response header fields of unbounded length.
> 
> Same here, except no new MUST is needed.
> 
> 
> > If chunked is applied to a payload body, the sender MUST NOT apply
> > chunked more than once
> 
> The precondition is bogus: If chunked is NOT [yet?] applied to a payload
> body, the sender still MUST NOT apply chunked more than once!

No I think it's the wording which is wrong. The intent was to ensure that
we never chunk more than once. It's as always the passive form which causes
trouble because you never know if it applies to what you see or what you do.
I'd suggest something like this :

  If sender applies chunked encoding to a payload body, it MUST NOT apply
  it more than once.

> > the sender MUST NOT apply chunked more than once
> 
> This needs to be rephrased to make it clear that proxies are not
> responsible for dechunking multiple chunked encodings to make the
> forwarded message comply with this MUST. For example, we could say: "the
> sender MUST NOT generate messages with multiple chunked encodings".
> 
> Please note that both the proposed "multiple chunked encodings" and the
> existing "more than once" wordings imply that foo,chunked,bar,chunked
> combination is also not allowed.

Maybe this could be moved to the part that says that chunked is always
the last encoding, and summarized into a single sentence. Eg:

  Chunked encoding MUST NOT be apply more than once on a message payload
  and must always be last.

> > A server MUST send an empty trailer with the chunked transfer coding
> > unless at least one of the following is true:
> 
> This should be relaxed to "A server MUST generate ..." because a proxy,
> in general, does not know whether bullet #2 ("the trailer fields consist
> entirely of optional metadata...") is true. Even though chunking is a
> hop-by-hop mechanism, proxies ought to forward Trailers whenever
> possible, right?

I'm having a doubt on this one since, in my opinion, trailers are always
optional since we never knwo whether they'll correctly be interpreted.
However I agree that a proxy or gateway should always forward them.

> > a client MUST send only the absolute path and query components of the
> > target URI as the request-target
> 
> > To allow for transition to the absolute-form for all requests in some
> > future version of HTTP, HTTP/1.1 servers MUST accept the
> > absolute-form in requests
> 
> Should the first "MUST send" be relaxed to "MUST generate" so that the
> proxies do not block the apparently anticipated "transition to the
> absolute-form for all requests" by stripping URIs as they forward them?

Yes, probably.

(...)
> > A client that does not support persistent connections MUST send the
> > "close" connection option in every request message.
> 
> Including a CONNECT request message?

Yes, and indeed I've seen at least one proxy do that in the past. It's not
a problem to have a "close" with a CONNECT, since CONNECT simply transfers
messages of infinite length in both directions, which are only terminated
by a close. Also, when sending a CONNECT to an authenticating proxy, you'd
rather use "Connection: close" if you think you'll have to reauthenticate
using a new connection when facing a 407.

> > A client that pipelines requests MUST be prepared to retry those requests
> 
> MUST be prepared to retry but does not have to retry? Or MUST retry?

Same as above, "must be prepared" here means that it must make valid
choices even before facing the failure (eg: not send non-idempotent
requests).

> > A client that pipelines requests MUST be prepared to retry those
> > requests if the connection closes before it receives all of the
> > corresponding responses.
> 
> Please clarify that the client MUST retry unanswered requests and not
> all "those requests" it pipelined.

Good point.

> > MUST NOT pipeline on a retry connection until it knows the connection
> > is persistent.
> 
> Is it really possible to know that a connection _is_ persistent?

Well, it's by definition until a "connection: close" response is seen.
Persistent does not mean forever, so if no "close" is seen and the
connection is closed by the server just after the response, it was a
persistent one for too short a time for you to reuse it.

> And
> what does that really mean for a connection to be persistent "now"? I
> think all it means is that the connection seems to be "open" -- that the
> sender has not received an error or connection close notification [yet].
> 
> It is possible to know that the connection was persistent (i.e., handled
> multiple messages), but since the server may close an idle (from the
> server point of view) connection at any time, I do not think it is
> possible to know that the next message will reach the server, unless the
> client and the server are coordinating out of band somehow.

Which is the point for not sending non-idempotent requests on such connections.

> What are we trying to achieve with this requirement? Avoid multiple and
> possibly endless retries? Minimize the chances that a retry will have to
> be retried? Perhaps the MUST can be relaxed or reworded to reflect the
> true intent?

I don't know.

> And here is a list of MUST-level requirements that are missing an
> explicit actor on which the requirement is placed. Most of these should
> be easy to rephrase to place the requirement on the intended actor
> (e.g., "A proxy MUST" instead of "header field MUST":
> 
> > An unrecognized header field received by a proxy MUST be forwarded
> > downstream

Here the only actor is the proxy.

> > The host MUST NOT be empty; if an "http" URI is received with an
> > empty host, then it MUST be rejected as invalid.

I think this one is not ambiguous as it describes a protocol violation
which must end up with a 400.

> > the TCP connection MUST be secured,
> 
> > These special characters MUST be in a quoted string

OK for these ones.

> > the message framing is invalid and MUST be treated as an error

"MUST be treated" indicates the actor is the message receiver.

> > a response message received by a user agent, it MUST be treated as an
> > error

here again.

> > The trailer MUST NOT contain fields
>
> > the Host field-value MUST be identical
> 
> > the Host header field MUST be sent with an empty field-value.

I on't have the context in mind but all 3 above again indicate how to
detect protocol violations.

> > The "Via" header field MUST be sent by a proxy

"MUST ... by" so the actor is the proxy.

> > the connection MUST be closed after the current request/response is
> > complete

Possible doubt here indeed.

> > all messages on a connection MUST have a self-defined message length

Ambiguous as well.

> > the first action after changing the protocol MUST be a response

Here I think the intent is on the server (it's about Upgrade+101 I guess).

> Please be careful with "send" and "generate" when fixing the above
> actorless rules so that the proxies do not accidentally become
> responsible for policing traffic where unnecessary.

In my opinion, "send" includes "forward" and "generate", which is the
reason why there are a number of "except" or "unless" in the wording.
Note that some intermediaries just blindly pass data blocks which is
clearly "forwarding", but other ones parse, build structures and rebuild
requests from these structures, so it's more "recv+send" than "forward",
eventhough they don't necessarily understand what is there.

Maybe some additional definitions are needed at the beginning to clarify
this ?

Thanks,
Willy
Received on Tuesday, 30 April 2013 19:40:50 UTC