W3C home > Mailing lists > Public > ietf-http-wg@w3.org > April to June 2009

Re: Content Sniffing impact on HTTPbis - #155

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 06 Jun 2009 00:28:30 +0200
To: Ian Hickson <ian@hixie.ch>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <dg5j2558gu5a9duuthrbnog66jsm31bva3@hive.bjoern.hoehrmann.de>
* Ian Hickson wrote:
>The original reason for this was that I did not want to sniff as a 
>particular type a file that only contained a BOM, since it is more likely 
>that this is an error and that the file is really some other encoding.

I do not really share that assessment, but as it is written in the
draft, you end up "sniffing" text/plain either way.

>> As I read the draft, UTF-32LE encoded text/plain documents will be 
>> sniffed as text/plain because they have a UTF-16LE BOM; UTF-32BE encoded 
>> text/plain documents will be sniffed as application/octet- stream. This 
>> is inconsistent and confusing (there is suddenly some doubt whether you 
>> treat the document as UTF-16 or UTF-32, and while browsers might not 
>> support UTF-32, other applications will).
>We're explicitly not supporting UTF-32. For more details see HTML5.

I fail to see the relevance. The draft is unclear and misleading with
respect to the handling of UTF-32 encoded text/plain documents. There
is nothing that "HTML5" could say as a remedy, at least not until the
draft references "HTML5" to that end.

The draft has a similar problem with the iso-8859-1 cases in 3.3: if
such documents start with what appears to be a BOM, then the BOM is
the reason for "sniffing" them as text/plain, casting doubt whether
you then also treat them as in some UTF encoding or not.
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Friday, 5 June 2009 22:29:05 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:10:49 UTC