Re: Precision and error handling (was URL work in HTML 5) from Eric J. Bowman on 2012-10-07 (www-tag@w3.org from October 2012)

From: Eric J. Bowman <eric@bisonsystems.net>
Date: Sun, 7 Oct 2012 17:06:15 -0600
To: Larry Masinter <masinter@adobe.com>
Cc: Noah Mendelsohn <nrm@arcanedomain.com>, "Michael[tm] Smith" <mike@w3.org>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Robin Berjon <robin@w3.org>, W3C TAG <www-tag@w3.org>
Message-Id: <20121007170615.0f1f3bbe.eric@bisonsystems.net>

Larry Masinter wrote:
>
> The previous HTML registration http://tools.ietf.org/html/rfc2854
> also specified likely error behavior.
>

I can't think of any that aren't legacy SGML issues, like unclosed <p>s?

>
> Media Type registrations were intended to register the
> language-as-used.
> 

Wouldn't that be the result of tying to the author document?  The idea
of the HTML parser is that it always produces a DOM that can serialize
out into a document conforming to the author spec.  If it doesn't
matter how HTML is written, but only how it's read, then parser output
becomes the "language-as-used" and the mechanics of how we get there
from tag soup are irrelevant.

In fact, this would allow the browser spec to evolve independently of
the defined goal of its output -- the author document could still be a
subset of a new browser spec, as the semantics and syntax of HTML are
inherently more stable over time than how HTML is rendered, or errors
are recovered from.

A new generation of browsers coming along and wanting to standardize
support for new APIs nobody's thought of yet, seems inevitable, and
likely to happen more frequently than HTML's syntax and semantics are
revisited, as issues like "what is a paragraph" become long-settled.
I'd rather see the media type registration track the language itself,
rather than browser-feature generations built around that language.

-Eric

Received on Sunday, 7 October 2012 23:06:39 UTC