Re: NEW ISSUE: content sniffing from Ian Hickson on 2009-04-03 (ietf-http-wg@w3.org from April to June 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 3 Apr 2009 05:36:33 +0000 (UTC)
To: Adrien de Croy <adrien@qbik.com>
Cc: Shane McCarron <shane@aptest.com>, Adam Barth <w3c@adambarth.com>, "Roy T. Fielding" <fielding@gbiv.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <Pine.LNX.4.62.0904030448320.25082@hixie.dreamhostps.com>

On Fri, 3 Apr 2009, Adrien de Croy wrote:
> 
> So why should HTTP validate that the next level identifier 
> (Content-Type) is correct for the entity body?

Because there is a practical difference between the Ethernet/IP and the 
IP/TCP boundaries, and the HTTP/Content boundary. In the latter case, 
there are millions if not billions of cases where authors who have never 
even heard of HTTP are misconfiguring their servers, leaving user software 
in the unfortunate position of having to balance the satisfaction of their 
users (which would require some heuristic processing of the claimed type) 
and compliance with the specifications.

Generally, money will win in this kind of balance.

In the Ethernet/IP and the IP/TCP cases, the software is written by people 
who have at least heard of the relevant specifications, and may even have 
read them. There are also FAR fewer implementations of those levels of the 
stack than there are misconfigured installations of HTTP servers.

> There are zillions of content types.  Why should HTTP care about them?  
> It's an impossible task.

We're not saying HTTP should care about them. We're saying that one of 
these two options should be picked:

Either:

a) HTTP should not have any requirements for how to process Content-Type 
headers, and should just leave Content-Type to another spec, or:

b) HTTP should include the proposed limited Content-Sniffing algorithm, 
which would allow us to get software tools to converge on a single set of 
heuristics and thus reduce the security risk.

Note that we are _not_ asking for a uniform way of detecting all content 
types. In the vast majority of those "zillions" of cases, the algorithm 
Adam's I-D specifies will require UAs to honour the type.

> Now in a browser there is a next-level processor above HTTP.

This is not in any way limited to browsers. It applies to any tool that 
uses HTTP for user-visible content processing.

> It's the thing that actually gets the resource.  It's free to do 
> whatever it likes with that content.

No, HTTP says of the Content-Type header "its value indicates what 
additional content codings have been applied to the entity-body, and thus 
what decoding mechanisms must be applied in order to obtain the media-type 
referenced by the Content-Type header field".

If this is intended to allow user agents to ignore the Content-Type 
header, or apply heuristics, then a clearer statement of this would be an 
acceptable solution (proposal "a" above).

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 3 April 2009 05:37:10 UTC