- From: Larry Masinter <masinter@adobe.com>
- Date: Fri, 29 May 2009 12:46:39 -0700
- To: Sam Ruby <rubys@intertwingly.net>, Anne van Kesteren <annevk@opera.com>
- CC: Maciej Stachowiak <mjs@apple.com>, "Roy T. Fielding" <fielding@gbiv.com>, HTML WG <public-html@w3.org>
I think there are a couple of issues that are be worth
separating out, in the discussion labeled
"HTML interpreters vs. HTML user agents".
Scope of the document: does the document we're working on
apply to all HTML applications, only HTML interpreters,
only HTML User Agents with Users, etc.
I think the discussion forks into:
(a) we could more easily reach consensus on the body
if the claimed scope were limited, by, for example,
changing the title and abstract, or
(b) the intent of the authors, the charter of the group,
and practical use, call for a language specification
which is not narrowly scoped; we should fix the
problems that would prevent its broad applicability?
Does anyone see any other choice? I'd prefer (b),
of course.
As an example of something in the document for which
scope is relevant, the issue of "content type sniffing"
was raised. Do the requirements for content-type
sniffing only apply to "browsers", or to all HTML
processors including feed readers?
In this case, I think there are two separate situations
which have different perspectives:
a) Content-type sniffing of URIs within a HTML document
itself: for references to external content, and
processing rules which describe what those references are
intended to mean. So, for example, if I say
http://example.com/foo.gif in an <img>, I could define
img@src to say, "if the protocol of the URI
is http:, don't follow exactly the HTTP spec when
interpreting the URI, but instead do the following", and
describe HTML's own rules for content-type sniffing, and
for treating images that *say* they are GIF files but
*look* like they are JPEG files, well, as JPEG files.
It's possible to do that. I don't like it much, I
certainly think that it needs to be documented and
reviewed and well-understood by network intermediaries
that could care less about HTML and APIs and layout but
want to scan JPEG images for security problems or naughty
seditious images or whatever, and so a separate document
with external review seems really important, but at
least it's something that HTML *can* do.
b) Content type sniffing of HTML itself.
This is the part I have trouble with. If I have a
specification for a language, I could tell people how to
recognize instances of that language.
Let's say ISO defined "The Angle Bracket Language". It
consists of "Any string of characters in any encoding
which contains angle brackets."
And I could give a rule -- "You should recognize any
document with angle brackets as if it were served as
text/angle-bracket, no matter what the MIME type is."
But-- what is the scope of applicability of this new rule?
Does it apply only to angle bracket processors? Only web
browsers? To anything that wants to be an angle-bracket
processor but also wants to process HTML?
Does the organization that publishes this fine
new standard matter? If the W3C publishes it,
does it now apply to all W3C specs?
Does it apply to all web browsers, if it is a publication
of W3C? To feed readers too?
If it is published by ISO (oh, say, like ISO has published
HTML4 https://www.cs.tcd.ie/15445/15445.HTML) can ISO
define how other processors are to interpret HTTP
results that say they are text/html but really --
because they have angle brackets -- SHOULD be
interpreted as text/angle-bracket?
I think the IETF delegated the authority to the W3C to
define what text/html and application/xhtml+xml "mean", and
the W3C membership, by their approval of the charter of this
working group, have delegated the authority to the W3C HTML
working group come up with a proposal, for member approval,
which defines text/html, and is working on deciding which
group(s) define application/xhtml+xml.
I don't see any authority or practical way in which this
working group could realistically define what anyone else
considers to be an instance of the language it is
defining. Certainly the HTML specification can't redefine
"text/plain" to be anything other than "text/plain",
for references that are not themselves invoked from
inside HTML.
Larry
--
http://larry.masinter.net
Received on Friday, 29 May 2009 19:47:50 UTC