W3C home > Mailing lists > Public > www-tag@w3.org > June 2009

Re: stability of content type sniffing algorithm? contentTypeOverride-24 / issue-24

From: Mark Nottingham <mnot@mnot.net>
Date: Fri, 5 Jun 2009 09:56:53 +1000
Cc: www-tag@w3.org
Message-Id: <07BD6A26-C4BD-4EB8-833D-85E13C8483B3@mnot.net>
To: Dan Connolly <connolly@w3.org>
I must say I have the same concern; the next Great New Format is going  
to place the exact same pressure on browser vendors as RSS, Atom and  
the rest have.

I don't think it's valid to read this document with the expectation  
that it will be the *last* content-sniffing spec.



On 29/05/2009, at 2:34 AM, Dan Connolly wrote:

> I recently gave the mime-sniff a somewhat closer look,
> including these two paragraphs, which looked familiar:
>
> [[
>   This document describes a mime sniffing algorithm that carefully
>   balances the compatibility needs of browser vendors with the  
> security
>   constraints.  The algorithm has been constructed with reference to
>   mime sniffing algorithms present in popular Web browsers, an
>   extensive database of Web content, and metrics collected from
>   implementations deployed to a sizable number of Web users.
>
>   Warning!  It is imperative that the algorithm in this document be
>   followed exactly.  When a user agent uses different heuristics for
>   content type detection than the server expects, security problems  
> can
>   occur.  For example, if a server believes that the client will treat
>   a contributed file as an image (and thus treat it as benign), but a
>   Web browser believes the content to be HTML (and thus execute any
>   scripts contained therein), the end user can be exposed to malicious
>   content, making the user vulnerable to cookie theft attacks and  
> other
>   cross-site scripting attacks.
> ]]
> -- http://ietfreport.isoc.org/idref/draft-abarth-mime-sniff/
>
> I had an uneasiness about them that I wasn't sure how to articulate,
> but then I just read this:
>
> -------- Forwarded Message --------
> http://lists.w3.org/Archives/Public/public-html/2009May/0524.html
>> From: Sam Ruby <rubys@intertwingly.net>
>> To: Anne van Kesteren <annevk@opera.com>
>> Cc: Maciej Stachowiak <mjs@apple.com>, Roy T. Fielding
>> <fielding@gbiv.com>, Larry Masinter <masinter@adobe.com>, HTML WG
>> <public-html@w3.org>
>> Subject: Re: HTML interpreter vs. HTML user agent
>> Date: Thu, 28 May 2009 09:41:36 -0400
> [...]
>> The actual observed behavior of user agents designed to (primarily)
>> process content of a certain media type (either in general, or in the
>> specific context) is to make every effort to parse the content  
>> according
>> to those rules, and only if such rules fail to produce meaningful
>> results will they investigate alternatives.
>>
>> Browsers will first attempt to process content as HTML.
>> FeedReaders will first attempt to process content as a feed.
>> Media plays will first attempt to process content as media.
>>
>> Browsers, when chasing an image tag, will make different assumptions
>> than when presented with a raw uri from the chrome.
>>
>> All are equally "right" or "wrong".
>>
>> None of this is meant to imply that the behavior that is being  
>> settled
>> upon by browser manufacturers isn't worth specifying or  
>> standardizing.
>>
>> - Sam Ruby
>
> Is there any reason to believe that the next sort of content
> to hit the web won't disrupt things much like java .jar files
> and RSS/Atom feeds and mp3/wma media?
>
> I think it's worthwhile to update our finding on authoritative
> metadata* to acknowledge draft-abarth-mime-sniff and the practice
> it represents... but I'm struggling to figure out exactly
> what to say.
>
> * http://www.w3.org/2001/tag/doc/mime-respect-20060412
>
> It's pretty clear to me that people will take the shortest path
> to their target, and that usually doesn't involve editing
> the .htaccess file when they test their RSS file with their
> RSS readers. It's not until the RSS reader gets integrated
> into the web browser that the HTTP client's presumption
> is that it's getting a feed goes away (and even then,
> not completely).
>
>
> -- 
> Dan Connolly, W3C http://www.w3.org/People/Connolly/
> gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
>
>


--
Mark Nottingham     http://www.mnot.net/
Received on Thursday, 4 June 2009 23:57:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:14 GMT