- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 30 Sep 2010 16:19:27 +0200
- To: Noah Mendelsohn <nrm@arcanedomain.com>
- CC: Noah Mendelsohn <noah@arcanedomain.com>, "www-tag@w3.org" <www-tag@w3.org>
On 30.09.2010 16:01, Noah Mendelsohn wrote: > Julian Reschke writes: > > > The background is that HTML5 specifies an algorithm for extracting the > > charset from content type information, which (1) requires accepting > invalid > > forms (single quotes), and (2) requires not to properly handle > escapes in > > quoted strings. > > Thank you for the very helpful clarification. I agree that these > "willfull violations" are significant, and should be minimized to the > extent practical. There is a big grey area between "sniffing" and > silently recovering from syntactic or other errors in headers. This > seems more toward the latter: allowing single quotes where double is > required is a different sort of "being liberal" than looking at > something labeled text/plain and determining "aha, you meant > image/jpeg". Thanks! > > Noah Note that allowing single quotes instead of double quotes may sound harmless, but: <http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.14.17>: Content-Type = "Content-Type" ":" media-type <http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.7>: media-type = type "/" subtype *( ";" parameter ) <http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.6>: parameter = attribute "=" value attribute = token value = token | quoted-string and finally <http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.2.2>: token = 1*<any CHAR except CTLs or separators> separators = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT So the single quote is indeed allowed in tokens, and charset='foobar' should be parsed as 'foobar' not foobar (note that single quotes in parameter values using the token syntax are indeed in use). Requiring special treatment will either cause UAs to have separate parsers (not good), or potentially break legitimate uses of single quotes in other header fields (very bad). I totally agree that UAs are very bad in header parsing; but adding more special cases doesn't seem to be an improvement. Best regards, Julian
Received on Thursday, 30 September 2010 14:46:51 UTC