Re: Use cases

Norman Walsh scripsit:

> 2. I have an HTML5 toolchain and I want to consume XML because I'd
>    like to process XML using HTML5 tools.
> 
> The HTML5 parser will be able to parse the XML so there's no parsing
> problem. It'll build the DOM that the HTML5 spec says such a document
> represents. There may be some namespace issues, but this should mostly
> "just work".

>From what I understand, this is not the case.  Since HTML5 parsers don't
treat namespace declarations properly, they will look only at the element
names, and if your element names coincide with those used in HTML, they
will get HTML treatment, which can produce a drastically different DOM
from what an XML parser would do.  This is not just a matter of the
element nodes being in the wrong namespace, but being *structurally*
wrong as well, due to the way that HTML5 parsers rearrange elements.
Only if your XML (and this is true of both whole documents and islands)
uses element names disjoint from HTML ones do you get proper behavior.

Of course, this applies to a parser that expects only HTML.  If it can
handle both HTML and XHTML, as most browsers do, then you will get a
structurally proper DOM.

> A simpler subset of XML might be created to make life easier for the
> cases that would be covered by such a subset.

It would have to, at the very least, exclude namespaces, element name
collisions, and empty tags.

> 3. I have an XML document and I want to embed islands of human prose
>    marked up with HTML5 in it because I want to be able to extract
>    those sections for use in, for example, documentation.
> 
> If you expect the document to remain well-formed XML, you'll have to
> author with XHTML5 and then there won't be any parsing problems.
> 
> The same semantic questions that arise in point 1 still apply.
> 
> 4. I have an HTML5 document and I want to embed islands of XML in it
>    because I want to be able to write JavaScript and CSS to manipulate
>    those elements, for example, in the browser.
> 
> On the surface, this would seem to be a perfectly straight-forward
> proposition. The XML content will be, by definition, well formed. The
> HTML5 parser might treat namespace declarations as simple attributes,
> but one expects a tree with at least the isomorphic shape in terms of
> elements and other nodes.
> 
> It turns out that this isn't the case. The HTML5 parsing rules
> explicitly flatten parts of the XML content if any of a wide variety
> of element names occur inside the fragment. (Including, but I do not
> assert limited to, "b", "big", "blockquote", "body", "br", "center",
> "code", "dd", "div", "dl", "dt", "em", "embed", "h1", "h2", "h3",
> "h4", "h5", "h6", "head", "hr", "i", "img", "li", "listing", "menu",
> "meta", "nobr", "ol", "p", "pre", "ruby", "s", "small", "span",
> "strong", "strike", "sub", "sup", "table", "tt", "u", "ul", "var", and
> "font" if certain attributes are present.)
> 
> There are a number of other rules in this area relating to how MathML
> and SVG are parsed and various conditions under which parsing modes
> shift in ways that I don't fully comprehend.
> 
> 5. I have a deeper nesting, XML containing HTML5 containing XML or
>    HTML5 containing XML containing HTML5 because I'm reusing content
>    that independently arose through use cases 3 or 4.
> 
> I think the answer to this use case falls naturally out of whatever
> resolution arises for cases 3 and 4, but it might be worth considering
> explicitly along the way.
> 
> What other use cases are there?
> 
> If the five I've outlined pretty much cover the space in question (and
> I make no such assertion, though it seems so to me) then I think the
> two most obvious problems that might be amenable to a technical
> solution are (a) how to simplify XML so that there's a shorter
> cognitive distance from HTML5 to XML and (b) how to make it possible
> to embed arbitrary XML fragments in HTML5 such that the resulting DOM
> has a tree strucure at least broadly isomorphic to what an XML parser
> would produce.
> 
> Have I gone totally off the rails somewhere?
> 
>                                         Be seeing you,
>                                           norm
> 
> -- 
> Norman Walsh
> Lead Engineer
> MarkLogic Corporation
> www.marklogic.com



-- 
John Cowan    http://ccil.org/~cowan  cowan@ccil.org
The Penguin shall hunt and devour all that is crufty, gnarly and
bogacious; all code which wriggles like spaghetti, or is infested with
blighting creatures, or is bound by grave and perilous Licences shall it
capture.  And in capturing shall it replicate, and in replicating shall
it document, and in documentation shall it bring freedom, serenity and
most cool froodiness to the earth and all who code therein.  --Gospel of Tux

Received on Friday, 31 December 2010 01:28:07 UTC