- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Wed, 22 Aug 2007 14:15:54 -0700
- To: Ian Hickson <ian@hixie.ch>
- Cc: public-html@w3.org
On Aug 21, 2007, at 11:16 PM, Ian Hickson wrote: > On Tue, 21 Aug 2007, Roy T. Fielding wrote: >>> >>> I entirely agree with the above and in fact with most of you wrote, >>> but just for the record, could you list some of the Web-based >>> functionality you mention that depends on Content-Type to control >>> behaviour? >> >> Mostly intermediaries, spiders, and the analysis engines that perform >> operations on the results of both. > > Could you name any specific intermediaries, spiders, and analysis > engines > that honour Content-Type headers with no sniffing? I would really > like to > study some of these in more detail. MOMspider, W3CRobot, several hundred scripts based on libwww-perl or LWP, PhpDig, and probably others listed at http://en.wikipedia.org/wiki/Web_crawler I don't use much intermediary code, but you can see the features on something like http://dansguardian.org/?page=introduction is pretty standard. >> The more serious breakage is the example I provided in the metadata >> finding, which could apply to anything that attempts to evaluate >> scripts on the basis of the content being sniffed as HTML. > > There is currently *no way*, given an actual MIME type, for the > algorithm > in the HTML5 spec to sniff content not labelled explicitly as HTML > to be > treated as HTML. The only ways for the algorithms in the spec to > detect a > document as HTML is if it has no Content-Type header at all, or if > it has > a header whose value is unknown/unknown or application/unknown. Not even <embed src="myscript.txt" type="text/html">? It is better than I originally thought after the first read, though. I suggest restructuring the paragraphs into some sort of decision table or structured diagram, since all the "goto section" bits make it difficult to understand. http://en.wikipedia.org/wiki/Decision_tables http://www.cs.umd.edu/hcil/members/bshneiderman/nsd/ >>> Note that HTML5 goes out of its way to try to improve the situation, >>> by limiting the sniffing to very specific cases that browsers are >>> being forced into handling by market pressures. So HTML5, as written >>> today, stands to actually improve matters on the long term. >> >> HTML is a mark-up language -- it isn't even involved at that level of >> the architecture. > > Sadly, it is. Authors rely on UAs handling the URIs in <img> elements > as images regardless of Content-Type and HTTP response codes. > Authors rely > on <script> elements parsing their resources as scripts regardless > of the > type of the remote resource. And so forth. These are behaviours > that are > tightly integrated with the markup language. They don't rely on them -- they are simply not aware of the error. It would only make sense to rely on them if those elements were incapable of handling properly typed content. > Furthermore, to obtain > interoperable *and secure* behaviour when navigating across browsing > contexts, be they top-level pages (windows or tabs), or frames, > iframes, > or HTML <object> elements, we have to define how browsers are to > handle > navigation of content for those cases. Yes, but why can't that definition be in terms of the end-result of type determination? Are we talking about a procedure for sniffing in which context is a necessary parameter, or just a procedure for handling the results of sniffing per context. >> Orthogonal standards deserve orthogonal specs. Why don't you put >> it in >> a specification that is specifically aimed at Web Browsers and >> Content >> Filters? > > The entire section in question is entitled "Web browsers". Browsers > are > one of the most important conformance classes that HTML5 targets (the > other most important one being authors). We would be remiss if we > didn't > define how browsers should work! Everything does not need to be defined in the same document. >> I agree that a single sniffing algorithm would help, eventually, >> but it >> still needs to be clear that overriding the server-supplied type >> is an >> error by somebody, somewhere, and not part of "normal" Web browsing. > > Of course it is. Who said otherwise? Where is the error handling specified for sniffed-type != media-type? ....Roy
Received on Wednesday, 22 August 2007 21:16:01 UTC