- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Fri, 01 Oct 2010 09:19:06 +0200
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- CC: "www-tag@w3.org" <www-tag@w3.org>
On 01.10.2010 01:07, Bjoern Hoehrmann wrote: > * Julian Reschke wrote: >> The background is that HTML5 specifies an algorithm for extracting the >> charset from content type information, which (1) requires accepting >> invalid forms (single quotes), and (2) requires not to properly handle >> escapes in quoted strings. > > Usually, what happens if you decide to ignore the standard and make your > own rules, you introduce subtle problems that you had not thought about. > As http://lists.w3.org/Archives/Public/ietf-http-wg/2009AprJun/0504.html > I noted some time ago, the algorithm Ian proposes is inconsistent with > the HTTP specification and HTTP implementations such as browsers in its > handling of strings like > > text/plain;whatever="charset=iso-8859-2";charset=iso-8859-3 That algorithm was in the spec since spring, and I raised a bug late April. It was finally processed last month. The fuzzy matching now is out (despite the author claimed he rejected my bug report). So apparently this was a "willful violation" that wasn't based on evidence, just on sloppiness. There are two more cases (see the current open tracker issues) left. > as the algorithm does not handle quoted strings at all and just does a > stateless scan for "charset". That of course concerned processing the > HTTP Content-Type header, the<meta> element is different and the HTML > specification could only violate HTTP there if it pretended that<meta> > has much to do with HTTP. If you compare the case above at the HTTP and > at the<meta> level you should find some browsers use different parsers > for them. I suspect that as well. What's needed is proper testing (both for <meta> and the HTTP header). With the current state of the spec, we may end up with broken parsers for <meta> leaking out into HTTP header parsing. Best regards, Julian
Received on Friday, 1 October 2010 08:19:48 UTC