- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 30 Sep 2010 16:19:27 +0200
- To: Noah Mendelsohn <nrm@arcanedomain.com>
- CC: Noah Mendelsohn <noah@arcanedomain.com>, "www-tag@w3.org" <www-tag@w3.org>
On 30.09.2010 16:01, Noah Mendelsohn wrote:
> Julian Reschke writes:
>
> > The background is that HTML5 specifies an algorithm for extracting the
> > charset from content type information, which (1) requires accepting
> invalid
> > forms (single quotes), and (2) requires not to properly handle
> escapes in
> > quoted strings.
>
> Thank you for the very helpful clarification. I agree that these
> "willfull violations" are significant, and should be minimized to the
> extent practical. There is a big grey area between "sniffing" and
> silently recovering from syntactic or other errors in headers. This
> seems more toward the latter: allowing single quotes where double is
> required is a different sort of "being liberal" than looking at
> something labeled text/plain and determining "aha, you meant
> image/jpeg". Thanks!
>
> Noah
Note that allowing single quotes instead of double quotes may sound
harmless, but:
<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.14.17>:
Content-Type = "Content-Type" ":" media-type
<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.7>:
media-type = type "/" subtype *( ";" parameter )
<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.6>:
parameter = attribute "=" value
attribute = token
value = token | quoted-string
and finally <http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.2.2>:
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
So the single quote is indeed allowed in tokens, and
charset='foobar'
should be parsed as
'foobar'
not
foobar
(note that single quotes in parameter values using the token syntax are
indeed in use).
Requiring special treatment will either cause UAs to have separate
parsers (not good), or potentially break legitimate uses of single
quotes in other header fields (very bad).
I totally agree that UAs are very bad in header parsing; but adding more
special cases doesn't seem to be an improvement.
Best regards, Julian
Received on Thursday, 30 September 2010 14:46:51 UTC