Re: New HTML/XML Task Force Report published

Henri Sivonen <> writes:
> I think the paragraph
> 'The HTML5 parser will be able to parse the XML so in principle there is
> no parsing problem except that the parser will build a DOM according to
> the HTML5 parsing rules. There may be be issues with namespaces, but
> this may "just work" for many scenarios even if the the result would not
> meet the expectations for those who appreciate the full power of XML.'
> gives the wrong idea of what happens. I think the foremost issue isn't
> different namespaces but <foo/> getting parsed as a start tag most of
> the time and name collisions with certain HTML elements causing
> interesting effects.
> I suggest removing the paragraph.


> s/These rules will not always produce the same DOM that an XML parser
> would have produced./There rules with most often produce a DOM that is
> substantially different from the DOM that an XML parser would have
> produced./

I think I've addressed that in the course of implementing Robin's
comments, but please let me know if you disagree.

> This paragraph is strange:
> "There are still details of implementation to be considered in the case
> where HTML5 is represented with well-formed XML. Is the markup to be
> “clipped out” and handed to an HTML5 parser, or is the entire XML DOM
> going to be handed to the HTML5 engine?"
> If there's an XHTML5 subtree in a larger XML document, involving an HTML
> parser would be the least preferred interface to an (X)HTML5 subsystem.
> I think the most preferable would be passing a fragment of the app's
> internal data model (e.g. an in-memory tree) to the subsystem. If the
> subsystem interface wants a serialization and can ingest XHTML5, it
> would make more sense to use XHTML5 than HTML5 at the boundary to avoid
> the cases where some tree shapes don't round trip through the HTML
> serialization.
> I think this paragraph steps outside the stated use case, since it
> introduces a container outside the XML document:
> "A third solution is to process the compound messages using MIME
> multipart/related semantics, perhaps through facilities such as [MTOM]
> or [XOP]. This is very much like the escaped markup case where
> downstream processing must be sophisticated enough to reconstruct the
> authors intent."
> If this solution is to be mentioned, it would make sense to mention .zip
> instead of the more esoteric archive formats.

These two paragraphs are my attempt to incorporate text that we reviewed
in the use case:

I'm a little reluctant to remove them without reviewing the use case
document first. These feel like substantive disagreements to a use
case that I thought we'd reached consensus about.

> This is incorrect:
> "What the HTML5 parser produces when it processes this script element is
> a script element node in the DOM which contains the escaped character
> representation of the XML."
> The text node content of the script node is not escaped. It is ready to
> be used as input to an XML parser.


                                        Be seeing you,

Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 413 624 6676

Received on Tuesday, 28 June 2011 15:34:59 UTC