Re: review of content type rules by IETF/HTTP community from Roy T. Fielding on 2007-08-22 (public-html@w3.org from August 2007)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 22 Aug 2007 14:15:54 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: public-html@w3.org
Message-Id: <CAB201BB-AADB-496B-9EFB-57C6B2E5B8D2@gbiv.com>
On Aug 21, 2007, at 11:16 PM, Ian Hickson wrote:
> On Tue, 21 Aug 2007, Roy T. Fielding wrote:
>>>
>>> I entirely agree with the above and in fact with most of you wrote,
>>> but just for the record, could you list some of the Web-based
>>> functionality you mention that depends on Content-Type to control
>>> behaviour?
>>
>> Mostly intermediaries, spiders, and the analysis engines that perform
>> operations on the results of both.
>
> Could you name any specific intermediaries, spiders, and analysis  
> engines
> that honour Content-Type headers with no sniffing? I would really  
> like to
> study some of these in more detail.

MOMspider, W3CRobot, several hundred scripts based on libwww-perl
or LWP, PhpDig, and probably others listed at

    http://en.wikipedia.org/wiki/Web_crawler

I don't use much intermediary code, but you can see the
features on something like

    http://dansguardian.org/?page=introduction

is pretty standard.

>> The more serious breakage is the example I provided in the metadata
>> finding, which could apply to anything that attempts to evaluate
>> scripts on the basis of the content being sniffed as HTML.
>
> There is currently *no way*, given an actual MIME type, for the  
> algorithm
> in the HTML5 spec to sniff content not labelled explicitly as HTML  
> to be
> treated as HTML. The only ways for the algorithms in the spec to  
> detect a
> document as HTML is if it has no Content-Type header at all, or if  
> it has
> a header whose value is unknown/unknown or application/unknown.

Not even <embed src="myscript.txt" type="text/html">?

It is better than I originally thought after the first read, though.
I suggest restructuring the paragraphs into some sort of decision table
or structured diagram, since all the "goto section" bits make it
difficult to understand.

    http://en.wikipedia.org/wiki/Decision_tables
    http://www.cs.umd.edu/hcil/members/bshneiderman/nsd/


>>> Note that HTML5 goes out of its way to try to improve the situation,
>>> by limiting the sniffing to very specific cases that browsers are
>>> being forced into handling by market pressures. So HTML5, as written
>>> today, stands to actually improve matters on the long term.
>>
>> HTML is a mark-up language -- it isn't even involved at that level of
>> the architecture.
>
> Sadly, it is. Authors rely on UAs handling the URIs in <img> elements
> as images regardless of Content-Type and HTTP response codes.  
> Authors rely
> on <script> elements parsing their resources as scripts regardless  
> of the
> type of the remote resource. And so forth. These are behaviours  
> that are
> tightly integrated with the markup language.

They don't rely on them -- they are simply not aware of the error.
It would only make sense to rely on them if those elements were
incapable of handling properly typed content.

> Furthermore, to obtain
> interoperable *and secure* behaviour when navigating across browsing
> contexts, be they top-level pages (windows or tabs), or frames,  
> iframes,
> or HTML <object> elements, we have to define how browsers are to  
> handle
> navigation of content for those cases.

Yes, but why can't that definition be in terms of the end-result of
type determination?  Are we talking about a procedure for sniffing
in which context is a necessary parameter, or just a procedure for
handling the results of sniffing per context.

>> Orthogonal standards deserve orthogonal specs.  Why don't you put  
>> it in
>> a specification that is specifically aimed at Web Browsers and  
>> Content
>> Filters?
>
> The entire section in question is entitled "Web browsers". Browsers  
> are
> one of the most important conformance classes that HTML5 targets (the
> other most important one being authors). We would be remiss if we  
> didn't
> define how browsers should work!

Everything does not need to be defined in the same document.

>> I agree that a single sniffing algorithm would help, eventually,  
>> but it
>> still needs to be clear that overriding the server-supplied type  
>> is an
>> error by somebody, somewhere, and not part of "normal" Web browsing.
>
> Of course it is. Who said otherwise?

Where is the error handling specified for sniffed-type != media-type?

....Roy
Received on Wednesday, 22 August 2007 21:16:01 UTC