- From: Jamie Lokier <jamie@shareable.org>
- Date: Mon, 15 Mar 2004 17:42:22 +0000
- To: ietf-http-wg@w3.org
In RFC 2616, we have:
CHAR = <any US-ASCII character (octets 0 - 127)>
TEXT = <any OCTET except CTLs, but including LWS>
quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext = <any TEXT except <">>
quoted-pair = "\" CHAR
I have three questions:
1. This leads to the curious observation that octets 128 - 255 are
_valid_ in comments, text, quoted strings and so forth. But they are
_not valid_ after "\" inside a quoted-string. (They are valid after
"\" inside comments!)
Is this intentional, that octets 128 - 255 are allowed in text,
including inside quoted-string, and allowed after "\" in comments
but not in quoted-string?
2. Control chars (those in CTL) are permitted by the syntax after "\"
in quoted-string. It seems odd to allow control chars in the
headers at all. (It's even odder to allow ASCII control chars but
refuse octets 128 - 255 after "\" in qdtext). Is this intentional?
3. Although other ASCII control chars are permitted after "\", a lone
CR is not allowed. HTTP client/server code I have looked at in detail
(Apache, Squid, Mozilla) accepts lone CRs and treats them as LWS
in many contexts, albeit inconsistently. token).
Would it not make sense to formalise this, even if it's just in the
"Tolerant Applications" section? Then the rule for accepting LF
without CR could be simplified: tolerant applications might treat
LF as the line terminator, and CR as equivalent to LWS (some real ones
do that).
Alternatively it could be made a SHOULD or even MUST that programs
reject lone CRs, because of security implications: some proxies treat
"Authorization" <CR> ":"
as a header different from Authorization, and don't apply the rules
for proxies when this header is present, yet pass it on to origin
servers which then (non-compliantly) interpret the header as
equivalent to Authorization. It would be good to indicate that
programs should not accept messages containing embedded CRs like that.
This is implied by the grammar, yet every program I looked at
accepts embedded lone CRs without complaint, and may or may not
treat them as LWS in various contexts. Apache is interesting in
that it treats CR as LWS-equivalent nearly everywhere, but not
between the header name and ":", where it only allows SP and HT.
-- Jamie
Received on Monday, 15 March 2004 12:42:23 UTC