- From: Alex Rousskov <rousskov@measurement-factory.com>
- Date: Mon, 15 Mar 2004 14:40:30 -0700 (MST)
- To: Jamie Lokier <jamie@shareable.org>
- Cc: ietf-http-wg@w3.org
On Mon, 15 Mar 2004, Jamie Lokier wrote: > Apache was recently changed to skip LWS (more specifically, SP and > HT) characters between the field name and colon in a HTTP header. Disclaimer: I am biased because it looks like it was our tool that generated Apache bug report that caused the change. I believe Apache team did the right thing: skipping whitespace characters before colon is desirable/correct, and the bug had, albeit remote, security implications. > 1. Is LWS permitted between the field-name and colon? > > The grammar of RFC 2616 suggests that it is, because ":" is a > separator character, and thus the rule for implied LWS between > a token and a separator applies Yes. > The wording suggests otherwise, although it is not explicit: > > Each header field consists of a name followed by a colon > (":") and the field value. Field names are > case-insensitive. The field value MAY be preceded by any > amount of LWS, though a single SP is preferred. > > The wording explicit states LWS is permitted after the colon, > suggesting that the intention is that it's not permitted before > the colon. I believe the wording does not suggest anything beyond what it explicitly says. "MAY X" does not imply "MUST NOT Y". I hate cases where formal grammar is "explained" in semi-formal language, causing doubts and contradiction. In this particular case, however, a [less formal] MAY rule does not really contradict the [more formal] grammar. The fact that implementations vary does not prove that this wording implies something; there are other, more important, reasons for implementations to vary on the subject. > Many authors have taken that interpretion, resulting in most of > the servers I looked at not accepting LWS before the colon. Yes, most HTTP servers use ad hoc parsers instead of generating correct parsers for RFC 2616-based grammar. You can argue that the implementations in the field are more important than the RFC. If you do that, and if you are consistent, many grammar simplifications/changes would be required. > (They should probably reject the request, but all of them treat > it as an unknown header name including a space in the name token). They should accept the message, ignoring LWS (if any). > 2. What about LWS before the field-name? Do you mean SP or HT before the field-name? CRLF before the field-name would indicate the end of headers (the field-name would be a part of the body then). > Although it isn't the standard's role to state how a program should > respond to every kind of invalid message, IMO, documenting behavior on all valid and invalid inputs is what the ideal protocol specification must do. It is too late for HTTP, of course. > 1. Whether LWS is actually permitted between the field-name and colon. > (Grammar says it is; wording suggests it isn't. Implementations vary). IMO: Yes. Grammar says it is. The wording does not suggest it is not. Implementations vary. > 2. Whether LWS is actually permitted before the field-name. > (Grammar says it isn't. Implementations vary). There are probably many special cases here (folding, CRLF, first header, other headers, etc.). Implementations vary. > 4. That invalid field-names (such as containing control characters > or LWS) SHOULD (or MUST?) be rejected. How does one reject an invalid field-name? Do you mean that they should not be forwarded by proxies? But a proxy may be (should be?) acting like a tunnel when the message seems to be corrupted. Or do you mean origin servers should ignore them? But an origin server may be (should be?) acting like a tunnel to CGI-like applications when the message seems to be corrupted. > The most immediate question is number 1, as implementations vary in > their interpretation of the standard on that. I bet that implementations vary with regard to many if not most HTTP/1.1 MUSTs. That is, for many MUSTs you can find implementations that violate them under some conditions. This is expected given complexity and ambiguity of the standard itself combined with absence of compliance enforcement on top of the "garbage in, compliance out" development spirit. Alex.
Received on Monday, 15 March 2004 16:40:33 UTC