Re: review of content type rules by IETF/HTTP community from Ian Hickson on 2007-08-22 (public-html@w3.org from August 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 22 Aug 2007 06:16:39 +0000 (UTC)
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: public-html@w3.org
Message-ID: <Pine.LNX.4.64.0708220550160.29534@dhalsim.dreamhost.com>

On Tue, 21 Aug 2007, Roy T. Fielding wrote:
> >
> > I entirely agree with the above and in fact with most of you wrote, 
> > but just for the record, could you list some of the Web-based 
> > functionality you mention that depends on Content-Type to control 
> > behaviour?
> 
> Mostly intermediaries, spiders, and the analysis engines that perform 
> operations on the results of both.

Could you name any specific intermediaries, spiders, and analysis engines 
that honour Content-Type headers with no sniffing? I would really like to 
study some of these in more detail.

> The more serious breakage is the example I provided in the metadata
> finding, which could apply to anything that attempts to evaluate
> scripts on the basis of the content being sniffed as HTML.

There is currently *no way*, given an actual MIME type, for the algorithm 
in the HTML5 spec to sniff content not labelled explicitly as HTML to be 
treated as HTML. The only ways for the algorithms in the spec to detect a 
document as HTML is if it has no Content-Type header at all, or if it has 
a header whose value is unknown/unknown or application/unknown.

> > Note that HTML5 goes out of its way to try to improve the situation, 
> > by limiting the sniffing to very specific cases that browsers are 
> > being forced into handling by market pressures. So HTML5, as written 
> > today, stands to actually improve matters on the long term.
> 
> HTML is a mark-up language -- it isn't even involved at that level of 
> the architecture.

Sadly, it is. Authors rely on UAs handling the URIs in <img> elements 
as images regardless of Content-Type and HTTP response codes. Authors rely 
on <script> elements parsing their resources as scripts regardless of the 
type of the remote resource. And so forth. These are behaviours that are 
tightly integrated with the markup language. Furthermore, to obtain 
interoperable *and secure* behaviour when navigating across browsing 
contexts, be they top-level pages (windows or tabs), or frames, iframes, 
or HTML <object> elements, we have to define how browsers are to handle 
navigation of content for those cases.

> Orthogonal standards deserve orthogonal specs.  Why don't you put it in 
> a specification that is specifically aimed at Web Browsers and Content 
> Filters?

The entire section in question is entitled "Web browsers". Browsers are 
one of the most important conformance classes that HTML5 targets (the 
other most important one being authors). We would be remiss if we didn't 
define how browsers should work!

> I agree that a single sniffing algorithm would help, eventually, but it 
> still needs to be clear that overriding the server-supplied type is an 
> error by somebody, somewhere, and not part of "normal" Web browsing.

Of course it is. Who said otherwise?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 22 August 2007 06:16:51 UTC