- From: Jamie Lokier <jamie@shareable.org>
- Date: Mon, 15 Mar 2004 17:42:22 +0000
- To: ietf-http-wg@w3.org
In RFC 2616, we have: CHAR = <any US-ASCII character (octets 0 - 127)> TEXT = <any OCTET except CTLs, but including LWS> quoted-string = ( <"> *(qdtext | quoted-pair ) <"> ) qdtext = <any TEXT except <">> quoted-pair = "\" CHAR I have three questions: 1. This leads to the curious observation that octets 128 - 255 are _valid_ in comments, text, quoted strings and so forth. But they are _not valid_ after "\" inside a quoted-string. (They are valid after "\" inside comments!) Is this intentional, that octets 128 - 255 are allowed in text, including inside quoted-string, and allowed after "\" in comments but not in quoted-string? 2. Control chars (those in CTL) are permitted by the syntax after "\" in quoted-string. It seems odd to allow control chars in the headers at all. (It's even odder to allow ASCII control chars but refuse octets 128 - 255 after "\" in qdtext). Is this intentional? 3. Although other ASCII control chars are permitted after "\", a lone CR is not allowed. HTTP client/server code I have looked at in detail (Apache, Squid, Mozilla) accepts lone CRs and treats them as LWS in many contexts, albeit inconsistently. token). Would it not make sense to formalise this, even if it's just in the "Tolerant Applications" section? Then the rule for accepting LF without CR could be simplified: tolerant applications might treat LF as the line terminator, and CR as equivalent to LWS (some real ones do that). Alternatively it could be made a SHOULD or even MUST that programs reject lone CRs, because of security implications: some proxies treat "Authorization" <CR> ":" as a header different from Authorization, and don't apply the rules for proxies when this header is present, yet pass it on to origin servers which then (non-compliantly) interpret the header as equivalent to Authorization. It would be good to indicate that programs should not accept messages containing embedded CRs like that. This is implied by the grammar, yet every program I looked at accepts embedded lone CRs without complaint, and may or may not treat them as LWS in various contexts. Apache is interesting in that it treats CR as LWS-equivalent nearly everywhere, but not between the header name and ":", where it only allows SP and HT. -- Jamie
Received on Monday, 15 March 2004 12:42:23 UTC