Re: Precision and error handling (was URL work in HTML 5) from Michael[tm] Smith on 2012-10-07 (www-tag@w3.org from October 2012)

From: Michael[tm] Smith <mike@w3.org>
Date: Sun, 7 Oct 2012 10:40:21 +0900
To: "Eric J. Bowman" <eric@bisonsystems.net>
Cc: Noah Mendelsohn <nrm@arcanedomain.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Robin Berjon <robin@w3.org>, Larry Masinter <masinter@adobe.com>, W3C TAG <www-tag@w3.org>
Message-ID: <20121007014019.GA93492@sideshowbarker>

"Eric J. Bowman" <eric@bisonsystems.net>, 2012-10-06 11:53 -0600:

> Noah Mendelsohn wrote:
> >
> > I might even go along with suggesting that [1] is the main
> > specification for the HTML(5) >language< and perhaps that it should 
> > be the basis of the media type registration.
> 
> Excellent point.  My server will consume user-created HTML 5, which it
> won't process with an HTML 5 parser.

Why not? Is the HTML that you're processing not meant to also ever be
consumed by browsers?

> It won't be creating a DOM or caring about any APIs.

You can have a parser that follows the tokenization rules in the HTML5 spec
and that then uses SAX or whatever to expose the start tags, characters,
etc., to the rest of your application without ever constructing a DOM.

> All it needs to know is the syntax of HTML 5.

The spec provides that information:

  http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#writing

> While one class of consumer (browsers) may wish the media type to be
> tied to the appropriate document for that need, this isn't the only
> possibility for consuming HTML 5, thus the correct architectural choice
> is to tie the media type to the author spec.

I don't think that's true. I think the correct architectural choice is to
tie the media type to the spec that attempts to be the most comprehensive
specification for the language. That's no different from the case of the
HTML4 spec. There was not a separate author spec for HTML4 -- there was
just one spec.

> Otherwise an expectation
> is created, that consumers accepting HTML will parse it a certain way.

I don't think that's necessarily true. The HTML spec explicitly defines
particular conformance classes, and makes it clear which parts of the spec
apply to which particular conformance classes and which do not.

  --Mike

-- 
Michael[tm] Smith http://people.w3.org/mike

Received on Sunday, 7 October 2012 01:40:31 UTC