Re: Content Sniffing impact on HTTPbis - #155 from Ian Hickson on 2009-06-05 (ietf-http-wg@w3.org from April to June 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 5 Jun 2009 19:05:49 +0000 (UTC)
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: Adam Barth <w3c@adambarth.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <Pine.LNX.4.62.0906051904530.16244@hixie.dreamhostps.com>

On Fri, 5 Jun 2009, Bjoern Hoehrmann wrote:
> 
> I see no justification for having a special algorithm for the charset 
> parameter; you extract the parameter just like any other. I also don't 
> know of any implementation that processes the header value like that; if 
> you have
> 
>   text/plain;whatever="charset=iso-8859-2";charset=iso-8859-3
> 
> Then the result of your algorithm is iso-8859-2", while the correct be- 
> havior yields iso-8859-3, which is also what IE6, FF 3.x, Opera 9, and 
> various non-browser applications use. The same goes for a simpler:
> 
>   text/plain;whatever="charset";charset=iso-8859-3
> 
> Where your algorithm returns nothing, and implementations implement the 
> correct behavior, which yields iso-8859-3. There also appears to be no 
> need to process escape sequences within quoted strings incorrectly, for 
> instance Opera 9 seems to implement that properly, so does my own code.

My testing at the time was written disagrees with the results of your 
testing. I believe this was primarily intended for charset extraction for 
<script> nodes, if that matters. However, if your results can be confirmed 
then that would certainly be good news.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 5 June 2009 19:06:31 UTC