- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Fri, 01 Oct 2010 01:07:52 +0200
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: "www-tag@w3.org" <www-tag@w3.org>
* Julian Reschke wrote: >The background is that HTML5 specifies an algorithm for extracting the >charset from content type information, which (1) requires accepting >invalid forms (single quotes), and (2) requires not to properly handle >escapes in quoted strings. Usually, what happens if you decide to ignore the standard and make your own rules, you introduce subtle problems that you had not thought about. As http://lists.w3.org/Archives/Public/ietf-http-wg/2009AprJun/0504.html I noted some time ago, the algorithm Ian proposes is inconsistent with the HTTP specification and HTTP implementations such as browsers in its handling of strings like text/plain;whatever="charset=iso-8859-2";charset=iso-8859-3 as the algorithm does not handle quoted strings at all and just does a stateless scan for "charset". That of course concerned processing the HTTP Content-Type header, the <meta> element is different and the HTML specification could only violate HTTP there if it pretended that <meta> has much to do with HTTP. If you compare the case above at the HTTP and at the <meta> level you should find some browsers use different parsers for them. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Thursday, 30 September 2010 23:08:29 UTC