Re: LWS around header names from Alex Rousskov on 2004-03-15 (ietf-http-wg@w3.org from January to March 2004)

From: Alex Rousskov <rousskov@measurement-factory.com>
Date: Mon, 15 Mar 2004 14:40:30 -0700 (MST)
To: Jamie Lokier <jamie@shareable.org>
Cc: ietf-http-wg@w3.org
Message-ID: <Pine.BSF.4.58.0403151333580.17944@measurement-factory.com>
On Mon, 15 Mar 2004, Jamie Lokier wrote:

> Apache was recently changed to skip LWS (more specifically, SP and
> HT) characters between the field name and colon in a HTTP header.

Disclaimer: I am biased because it looks like it was our tool that
generated Apache bug report that caused the change.

I believe Apache team did the right thing: skipping whitespace
characters before colon is desirable/correct, and the bug had, albeit
remote, security implications.

>     1. Is LWS permitted between the field-name and colon?
>
>        The grammar of RFC 2616 suggests that it is, because ":" is a
>        separator character, and thus the rule for implied LWS between
>        a token and a separator applies

Yes.

>        The wording suggests otherwise, although it is not explicit:
>
>           Each header field consists of a name followed by a colon
>           (":") and the field value. Field names are
>           case-insensitive. The field value MAY be preceded by any
>           amount of LWS, though a single SP is preferred.
>
>        The wording explicit states LWS is permitted after the colon,
>        suggesting that the intention is that it's not permitted before
>        the colon.

I believe the wording does not suggest anything beyond what it
explicitly says. "MAY X" does not imply "MUST NOT Y". I hate cases
where formal grammar is "explained" in semi-formal language, causing
doubts and contradiction. In this particular case, however, a [less
formal] MAY rule does not really contradict the [more formal] grammar.
The fact that implementations vary does not prove that this wording
implies something; there are other, more important, reasons for
implementations to vary on the subject.

>        Many authors have taken that interpretion, resulting in most of
>        the servers I looked at not accepting LWS before the colon.

Yes, most HTTP servers use ad hoc parsers instead of generating
correct parsers for RFC 2616-based grammar. You can argue that the
implementations in the field are more important than the RFC. If you
do that, and if you are consistent, many grammar
simplifications/changes would be required.

>        (They should probably reject the request, but all of them treat
>        it as an unknown header name including a space in the name token).

They should accept the message, ignoring LWS (if any).

>     2. What about LWS before the field-name?

Do you mean SP or HT before the field-name? CRLF before the field-name
would indicate the end of headers (the field-name would be a part of
the body then).

> Although it isn't the standard's role to state how a program should
> respond to every kind of invalid message,

IMO, documenting behavior on all valid and invalid inputs is what the
ideal protocol specification must do. It is too late for HTTP, of
course.

>    1. Whether LWS is actually permitted between the field-name and colon.
>       (Grammar says it is; wording suggests it isn't.  Implementations vary).

IMO: Yes. Grammar says it is. The wording does not suggest it is not.
Implementations vary.

>    2. Whether LWS is actually permitted before the field-name.
>       (Grammar says it isn't.  Implementations vary).

There are probably many special cases here (folding, CRLF, first
header, other headers, etc.). Implementations vary.

>    4. That invalid field-names (such as containing control characters
>       or LWS) SHOULD (or MUST?) be rejected.

How does one reject an invalid field-name? Do you mean that they
should not be forwarded by proxies? But a proxy may be (should be?)
acting like a tunnel when the message seems to be corrupted. Or do you
mean origin servers should ignore them? But an origin server may be
(should be?) acting like a tunnel to CGI-like applications when the
message seems to be corrupted.

> The most immediate question is number 1, as implementations vary in
> their interpretation of the standard on that.

I bet that implementations vary with regard to many if not most
HTTP/1.1 MUSTs. That is, for many MUSTs you can find implementations
that violate them under some conditions. This is expected given
complexity and ambiguity of the standard itself combined with absence
of compliance enforcement on top of the "garbage in, compliance out"
development spirit.

Alex.
Received on Monday, 15 March 2004 16:40:33 UTC