- From: Larry Masinter <masinter@adobe.com>
- Date: Fri, 29 May 2009 12:46:39 -0700
- To: Sam Ruby <rubys@intertwingly.net>, Anne van Kesteren <annevk@opera.com>
- CC: Maciej Stachowiak <mjs@apple.com>, "Roy T. Fielding" <fielding@gbiv.com>, HTML WG <public-html@w3.org>
I think there are a couple of issues that are be worth separating out, in the discussion labeled "HTML interpreters vs. HTML user agents". Scope of the document: does the document we're working on apply to all HTML applications, only HTML interpreters, only HTML User Agents with Users, etc. I think the discussion forks into: (a) we could more easily reach consensus on the body if the claimed scope were limited, by, for example, changing the title and abstract, or (b) the intent of the authors, the charter of the group, and practical use, call for a language specification which is not narrowly scoped; we should fix the problems that would prevent its broad applicability? Does anyone see any other choice? I'd prefer (b), of course. As an example of something in the document for which scope is relevant, the issue of "content type sniffing" was raised. Do the requirements for content-type sniffing only apply to "browsers", or to all HTML processors including feed readers? In this case, I think there are two separate situations which have different perspectives: a) Content-type sniffing of URIs within a HTML document itself: for references to external content, and processing rules which describe what those references are intended to mean. So, for example, if I say http://example.com/foo.gif in an <img>, I could define img@src to say, "if the protocol of the URI is http:, don't follow exactly the HTTP spec when interpreting the URI, but instead do the following", and describe HTML's own rules for content-type sniffing, and for treating images that *say* they are GIF files but *look* like they are JPEG files, well, as JPEG files. It's possible to do that. I don't like it much, I certainly think that it needs to be documented and reviewed and well-understood by network intermediaries that could care less about HTML and APIs and layout but want to scan JPEG images for security problems or naughty seditious images or whatever, and so a separate document with external review seems really important, but at least it's something that HTML *can* do. b) Content type sniffing of HTML itself. This is the part I have trouble with. If I have a specification for a language, I could tell people how to recognize instances of that language. Let's say ISO defined "The Angle Bracket Language". It consists of "Any string of characters in any encoding which contains angle brackets." And I could give a rule -- "You should recognize any document with angle brackets as if it were served as text/angle-bracket, no matter what the MIME type is." But-- what is the scope of applicability of this new rule? Does it apply only to angle bracket processors? Only web browsers? To anything that wants to be an angle-bracket processor but also wants to process HTML? Does the organization that publishes this fine new standard matter? If the W3C publishes it, does it now apply to all W3C specs? Does it apply to all web browsers, if it is a publication of W3C? To feed readers too? If it is published by ISO (oh, say, like ISO has published HTML4 https://www.cs.tcd.ie/15445/15445.HTML) can ISO define how other processors are to interpret HTTP results that say they are text/html but really -- because they have angle brackets -- SHOULD be interpreted as text/angle-bracket? I think the IETF delegated the authority to the W3C to define what text/html and application/xhtml+xml "mean", and the W3C membership, by their approval of the charter of this working group, have delegated the authority to the W3C HTML working group come up with a proposal, for member approval, which defines text/html, and is working on deciding which group(s) define application/xhtml+xml. I don't see any authority or practical way in which this working group could realistically define what anyone else considers to be an instance of the language it is defining. Certainly the HTML specification can't redefine "text/plain" to be anything other than "text/plain", for references that are not themselves invoked from inside HTML. Larry -- http://larry.masinter.net
Received on Friday, 29 May 2009 19:47:50 UTC