Re: Content Sniffing impact on HTTPbis - #155 from Ian Hickson on 2009-06-05 (ietf-http-wg@w3.org from April to June 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 5 Jun 2009 21:56:07 +0000 (UTC)
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: Adam Barth <w3c@adambarth.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <Pine.LNX.4.62.0906052154190.16244@hixie.dreamhostps.com>

On Fri, 5 Jun 2009, Bjoern Hoehrmann wrote:
> >>
> >> why step 3 only applies when you have at least three bytes and then 
> >> only compares two bytes,
> >
> >We could change this to be slightly tighter, but it's a bit pedantic.
> 
> When I read the text I suspect there is an error in the specifi- cation, 
> and would then implement what I think my application ought to do; but 
> your goal is that people implement it exactly as written.

The original reason for this was that I did not want to sniff as a 
particular type a file that only contained a BOM, since it is more likely 
that this is an error and that the file is really some other encoding.


> >> why the UTF-32 BOM is not being detected,
> >
> >We measured and determined that it was not needed for compatibility. In 
> >general, we tried to minimize the amount of sniffing.
> 
> As I read the draft, UTF-32LE encoded text/plain documents will be 
> sniffed as text/plain because they have a UTF-16LE BOM; UTF-32BE encoded 
> text/plain documents will be sniffed as application/octet- stream. This 
> is inconsistent and confusing (there is suddenly some doubt whether you 
> treat the document as UTF-16 or UTF-32, and while browsers might not 
> support UTF-32, other applications will).

We're explicitly not supporting UTF-32. For more details see HTML5.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 5 June 2009 21:56:42 UTC