- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Fri, 05 Jun 2009 18:14:46 +0200
- To: Adam Barth <w3c@adambarth.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
* Adam Barth wrote: >For which parts would you like a more detailed rationale? It's hard >for me to guess which parts you think are obscure. I've already mentioned the encoding extraction algorithm, but to add some others: in draft-abarth-mime-sniff-01 section 3 step 3's special handling of very particular sequences, the handling of unregistered and malformed values in step 5, the special handling of XML types in step 6, the relevance of the implementation supporting particular types in step 7. In section 4 why implementations may decide to pick any number of bytes between 0 and 512, why step 3 only applies when you have at least three bytes and then only compares two bytes, why the UTF-32 BOM is not being detected, why step four has those bytes and not others; in section 6 the special handling of image/svg+xml; in section 7 why the UTF-16 BOM is ignored. >The document defines algorithms for extracting information from the >Content-Type. That algorithm, in particular, extracts the charset >attribute from the Content-Type header. The algorithm is intended to >be reference by other specifications, such as HTML 5, which need to >determine the charset attribute of the Content-Type header is a manner >compatible with existing web content. I see no justification for having a special algorithm for the charset parameter; you extract the parameter just like any other. I also don't know of any implementation that processes the header value like that; if you have text/plain;whatever="charset=iso-8859-2";charset=iso-8859-3 Then the result of your algorithm is iso-8859-2", while the correct be- havior yields iso-8859-3, which is also what IE6, FF 3.x, Opera 9, and various non-browser applications use. The same goes for a simpler: text/plain;whatever="charset";charset=iso-8859-3 Where your algorithm returns nothing, and implementations implement the correct behavior, which yields iso-8859-3. There also appears to be no need to process escape sequences within quoted strings incorrectly, for instance Opera 9 seems to implement that properly, so does my own code. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Friday, 5 June 2009 16:15:22 UTC