- From: John Cowan <cowan@mercury.ccil.org>
- Date: Mon, 20 Dec 2010 15:29:25 -0500
- To: James Clark <jjc@jclark.com>
- Cc: public-html-xml@w3.org
James Clark scripsit: > - a subset of XML (and maybe XML namespaces); for the sake of > discussion, call this "convergently well-formed XML" > - some tweaks to the HTML syntax of HTML5 > - a subset of the tweaked HTML syntax of HTML5; call this "convergently > valid HTML5" > - a subset of the XML Infoset; call this the "convergent XML infoset" I agree generally, except that I think tweaks to HTML are a bad idea, for the same reason that I think tweaks to XML are a bad idea. I speak as someone who has pushed through two tweaks to XML already. Let us have no more. > The idea is to make polyglot documents a solid, reliable, workable > approach. Polyglot documents are a little different: they are documents that are both valid HTML and valid (i.e. DTD-valid) XHTML. The approach outlined above, with which I agree, only insists on well-formed XML (and possibly namespace-well-formed XML). > HTML5 in the HTML syntax could be processed by XML tools like a normal XML > vocabulary, provided only that the XML tools know about the extra > constraints of convergent well-formedness. There's two use cases here: 1) The HTML is wild, in which case you want an HTML parser, either an HTML5 parser or something like Tidy or TagSoup. 2) The HTML is carefully generated to be convergent. It's the second use case that matters to us, I think. The main reason the first use-case doesn't suffice is that many applications (notably editors) don't have pluggable parsers. What follows is my responses to some of your specific points: > 1. End-tags. Valid HTML5 does not allow end-tags for "void" (always > empty) element types. HTML5 parsers will ignore such end-tags > except in one case (<br>). > 2. Empty-element syntax. Valid HTML5 allows empty-element syntax > (<foo/>) only for "void" element types. If you use empty-element > syntax for a non "void" element type, it will be treated like a > normal start-tag. I think the proper approach here is to meet the HTML kludge with a direct XML counter-kludge. I see two possibilities: 1) Fixed option. Convergent XML: MUST serialize all HTML void elements with empty tags; MUST serialize all other elements with start and end tags. 2) Flexible option. Convergent XML: MUST serialize all HTML void elements with empty tags; MUST serialize all other HTML elements with start and end tags; MAY serialize non-HTML elements (including MathML and SVG) either way. > 3. Comments. HTML5 imposes restrictions on comments beyond those > in HTML4 or XML (must not start with "-" or "->") Just accept this restriction as part of convergent XML. > 4. DOCTYPE declaration. HTML5 documents have to start with a > DOCTYPE declaration. Convergent XML documents MAY begin with "<!DOCTYPE html>" and MUST NOT contain any other sort of document type declaration. That makes them invalid, but as I say, I don't think DTD-validity matters much any more. > Also I think we should look at the HTML5 distributed extensibility issue I don't have the energy to read through 600+ emails. If someone else does, fine. -- Your worships will perhaps be thinking John Cowan that it is an easy thing to blow up a dog? http://www.ccil.org/~cowan [Or] to write a book? --Don Quixote, Introduction cowan@ccil.org
Received on Monday, 20 December 2010 20:29:54 UTC