Re: NEW ISSUE: content sniffing from Adrien de Croy on 2009-03-31 (ietf-http-wg@w3.org from January to March 2009)

From: Adrien de Croy <adrien@qbik.com>
Date: Wed, 01 Apr 2009 10:54:01 +1300
To: Adam Barth <w3c@adambarth.com>
CC: Julian Reschke <julian.reschke@gmx.de>, ietf-http-wg@w3.org
Message-ID: <49D290F9.7080007@qbik.com>

So then surely the last word on what type of content something is, 
should be the actual content itself?

Things like Content-Type headers can be

* wrong (bad sniffing or mapping in server)
* missing
* tampered with

Relying on Content-Type therefore has associated risks.

So if any sniffing is to be done, surely it should only be the client?  
In which case why don't clients just ignore the Content-Type header 
always and always try and determine the type themselves.  Some seem to 
do this already.

Adrien



Adam Barth wrote:
> On Tue, Mar 31, 2009 at 2:23 PM, Adrien de Croy <adrien@qbik.com> wrote:
>   
>> Do servers sniff to try and fill in the Content-Type field?
>>     
>
> Yes.  We found this is quite common when we examined open-source Web
> applications that accept user uploads.  For example, Wikipedia does
> this.
>
>   
>> Most I think have a fairly simplistic static mapping of file extension to Content-Type.
>>     
>
> This is how Apache works.
>
>   
>> Many types of content already have a signature in them which can be used to
>> determine type. e.g jpegs, gifs etc.
>>     
>
> Wikipedia uses this technique.  Mismatches between a site's sniffing
> algorithm and the user agent's sniffing algorithm often lead to
> exploitable vulnerabilities.  See Section 2.5 of
> http://www.adambarth.com/papers/2009/barth-caballero-song.pdf for two
> concrete examples of how this happens.
>
> Adam
>
>   

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com

Received on Tuesday, 31 March 2009 21:51:37 UTC