Re: NEW ISSUE: content sniffing from Bil Corry on 2009-04-01 (ietf-http-wg@w3.org from April to June 2009)

From: Bil Corry <bil@corry.biz>
Date: Wed, 01 Apr 2009 10:05:17 -0500
To: Julian Reschke <julian.reschke@gmx.de>
CC: Adam Barth <w3c@adambarth.com>, "Roy T. Fielding" <fielding@gbiv.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <49D382AD.8090600@corry.biz>

Julian Reschke wrote on 4/1/2009 3:46 AM: 
> Adam Barth wrote:
>> On Tue, Mar 31, 2009 at 7:44 PM, Roy T. Fielding <fielding@gbiv.com>
>> wrote:
>>> It is impossible to determine a media-type (how a recipient should
>>> process a given representation) by sniffing the content (the data
>>> format).
>>
>> Regardless, many popular user agents override the server-provided MIME
>> type after examining the content of HTTP responses.
> 
> But only for a certain set of MIME types, which are known to be
> frequently incorrect, right?

I don't know if it has changed, but Internet Explorer will sniff when no mime-type is provided, or when the mime type is something IE knows about:

 http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx

Of course, IE8 added a flag that allows sites to turn off content sniffing, so that would be the exception.


>>> When a media type
>>> is not present (or is detectably incorrect), only the implementation
>>> doing the processing can determine an appropriate guess because that
>>> guess is almost always determined by the context in which the
>>> reference was made (not by the content).
>>
>> Content sniffing algorithms in browsers largely ignore the context in
>> which the HTTP response is being used.  Specifically, algorithm for
>> computing the effective MIME type is a function of the HTTP response
>> alone (both in practice and in draft-abarth-mime-sniff).
> 
> My understanding was that the context *is* relevant (stylesheet? image?).

IE will sniff images and render them as HTML if it believes they're actually HTML:

 http://www.heise-online.co.uk/security/Risky-MIME-sniffing-in-Internet-Explorer--/features/112589



>> ...
>> I don't think that content sniffing will magically disappear if we
>> just ignore it long enough.  Instead, we should shed some light on
>> this dark corner of reality.
>> ...
> 
> As long as it is clear that it's still an option not to sniff at all,
> and as long as the goal remains to minimize the amount of sniffing going
> on...

Microsoft tried turning off mime sniffing for plain/text and it broke a bunch of sites:

 http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx

Undoubtedly, that was the impetus for adding the flag to turn off content sniffing.  I think it's safe to assume content sniffing in IE won't be going away any time soon.


> That said, why is it important that the HTTP spec references this document?

Adam Barth's comments from earlier in the thread:

-----
When different user agents use different sniffing algorithms, content
authors pay a large cost, both in terms of compatibility and in terms
of security.  For user agents that wish to perform sniffing, I think
we'd be doing the Web a service by specifying which algorithm they
should use.
-----


- Bil

Received on Wednesday, 1 April 2009 15:05:56 UTC