W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2009

Re: NEW ISSUE: content sniffing

From: Adrien de Croy <adrien@qbik.com>
Date: Wed, 01 Apr 2009 10:54:01 +1300
Message-ID: <49D290F9.7080007@qbik.com>
To: Adam Barth <w3c@adambarth.com>
CC: Julian Reschke <julian.reschke@gmx.de>, ietf-http-wg@w3.org

So then surely the last word on what type of content something is, 
should be the actual content itself?

Things like Content-Type headers can be

* wrong (bad sniffing or mapping in server)
* missing
* tampered with

Relying on Content-Type therefore has associated risks.

So if any sniffing is to be done, surely it should only be the client?  
In which case why don't clients just ignore the Content-Type header 
always and always try and determine the type themselves.  Some seem to 
do this already.


Adam Barth wrote:
> On Tue, Mar 31, 2009 at 2:23 PM, Adrien de Croy <adrien@qbik.com> wrote:
>> Do servers sniff to try and fill in the Content-Type field?
> Yes.  We found this is quite common when we examined open-source Web
> applications that accept user uploads.  For example, Wikipedia does
> this.
>> Most I think have a fairly simplistic static mapping of file extension to Content-Type.
> This is how Apache works.
>> Many types of content already have a signature in them which can be used to
>> determine type. e.g jpegs, gifs etc.
> Wikipedia uses this technique.  Mismatches between a site's sniffing
> algorithm and the user agent's sniffing algorithm often lead to
> exploitable vulnerabilities.  See Section 2.5 of
> http://www.adambarth.com/papers/2009/barth-caballero-song.pdf for two
> concrete examples of how this happens.
> Adam

Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
Received on Tuesday, 31 March 2009 21:51:37 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 1 October 2015 05:36:32 UTC