W3C home > Mailing lists > Public > public-html-xml@w3.org > December 2010

Re: What problem is this task force trying to solve and why?

From: John Cowan <cowan@mercury.ccil.org>
Date: Mon, 20 Dec 2010 15:29:25 -0500
To: James Clark <jjc@jclark.com>
Cc: public-html-xml@w3.org
Message-ID: <20101220202924.GA17700@mercury.ccil.org>
James Clark scripsit:

> - a subset of XML (and maybe XML namespaces); for the sake of
>   discussion, call this "convergently well-formed XML"
> - some tweaks to the HTML syntax of HTML5
> - a subset of the tweaked HTML syntax of HTML5; call this "convergently
>   valid HTML5"
> - a subset of the XML Infoset; call this the "convergent XML infoset"

I agree generally, except that I think tweaks to HTML are a bad idea,
for the same reason that I think tweaks to XML are a bad idea.  I speak
as someone who has pushed through two tweaks to XML already.  Let us
have no more.

> The idea is to make polyglot documents a solid, reliable, workable
> approach.

Polyglot documents are a little different: they are documents that are
both valid HTML and valid (i.e. DTD-valid) XHTML.  The approach outlined
above, with which I agree, only insists on well-formed XML (and possibly
namespace-well-formed XML).

>  HTML5 in the HTML syntax could be processed by XML tools like a normal XML
> vocabulary, provided only that the XML tools know about the extra
> constraints of convergent well-formedness.

There's two use cases here:

1) The HTML is wild, in which case you want an HTML parser, either an
HTML5 parser or something like Tidy or TagSoup.

2) The HTML is carefully generated to be convergent.

It's the second use case that matters to us, I think.  The main reason the
first use-case doesn't suffice is that many applications (notably editors)
don't have pluggable parsers.

What follows is my responses to some of your specific points:

>    1. End-tags. Valid HTML5 does not allow end-tags for "void" (always
>       empty) element types. HTML5 parsers will ignore such end-tags
>       except in one case (<br>).
>    2. Empty-element syntax. Valid HTML5 allows empty-element syntax
>       (<foo/>) only for "void" element types. If you use empty-element
>       syntax for a non "void" element type, it will be treated like a
>       normal start-tag.

I think the proper approach here is to meet the HTML kludge with a direct
XML counter-kludge.  I see two possibilities:

1) Fixed option.  Convergent XML:

        MUST serialize all HTML void elements with empty tags;

        MUST serialize all other elements with start and end tags.

2) Flexible option.  Convergent XML:

        MUST serialize all HTML void elements with empty tags;

        MUST serialize all other HTML elements with start and end tags;

        MAY serialize non-HTML elements (including MathML and SVG)
        either way.

>    3. Comments.  HTML5 imposes restrictions on comments beyond those
>       in HTML4 or XML (must not start with "-" or "->")

Just accept this restriction as part of convergent XML.

>    4. DOCTYPE declaration. HTML5 documents have to start with a
>       DOCTYPE declaration.

Convergent XML documents MAY begin with "<!DOCTYPE html>" and MUST NOT
contain any other sort of document type declaration.  That makes them
invalid, but as I say, I don't think DTD-validity matters much any more.

> Also I think we should look at the HTML5 distributed extensibility issue

I don't have the energy to read through 600+ emails.  If someone else
does, fine.

Your worships will perhaps be thinking          John Cowan
that it is an easy thing to blow up a dog?      http://www.ccil.org/~cowan
[Or] to write a book?
    --Don Quixote, Introduction                 cowan@ccil.org
Received on Monday, 20 December 2010 20:29:54 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:58:27 UTC