Re: review of content type rules by IETF/HTTP community from Roy T. Fielding on 2007-08-21 (public-html@w3.org from August 2007)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Tue, 21 Aug 2007 15:38:53 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: public-html@w3.org
Message-Id: <27D22520-EAA3-403A-B532-96EF382A06DC@gbiv.com>

On Aug 21, 2007, at 2:32 PM, Ian Hickson wrote:
> I entirely agree with the above and in fact with most of you wrote,  
> but
> just for the record, could you list some of the Web-based  
> functionality
> you mention that depends on Content-Type to control behaviour? In my
> experience most non-browser based scripts and the like actually ignore
> Content-Type headers even more than browsers do, and it would be
> interesting to study the cases that actually honour them completely  
> (or
> at least, that honour these headers more than browsers do).

Mostly intermediaries, spiders, and the analysis engines that perform
operations on the results of both.  Spiders typically limit their
traversals to known hypertext formats (using HEAD to determine the
content type before retrieval is even attempted), though there are
well-known exceptions to that (Google slurps everything, IIRC).
Content analysis is usually focused on specific media types.
At the meta-level, there is also content management systems that
provide indexing, workflow, and sometimes versioning features based
on type, and WebDAV expects the types it sends to be consistent with
those on responses.  None of these will visibly "break" upon sniffing.
They simply lose functionality when nobody tells the author
that the media type has not been set correctly.

The more serious breakage is the example I provided in the metadata
finding, which could apply to anything that attempts to evaluate
scripts on the basis of the content being sniffed as HTML.  Content
filters used to be based on media type as well, back before MSIE,
but I presume that has changed.

> Note that HTML5 goes out of its way to try to improve the  
> situation, by
> limiting the sniffing to very specific cases that browsers are being
> forced into handling by market pressures. So HTML5, as written today,
> stands to actually improve matters on the long term.

HTML is a mark-up language -- it isn't even involved at that level
of the architecture.  Why should I (as an implementer) have to wade
through several pages of intense workarounds if all I am interested
in implementing is a compliant HTML editor?  Orthogonal standards
deserve orthogonal specs.  Why don't you put it in a specification
that is specifically aimed at Web Browsers and Content Filters?

I agree that a single sniffing algorithm would help, eventually,
but it still needs to be clear that overriding the server-supplied
type is an error by somebody, somewhere, and not part of "normal"
Web browsing.  Shit smells bad because it is in our best interests
to clean it up.  Just taking away the smell may be a solution for
some people, but that's not the same as cleaning it up.  If we have
a standard for nose-plug users, then I want a standard for janitors
to go along with it.

....Roy

Received on Tuesday, 21 August 2007 22:39:08 UTC