Re: New HTML/XML Task Force Report published

On Tue, 2011-03-22 at 14:50 -0400, Norman Walsh wrote:
> I've just published the first draft of any actual substance:
> I encourage you to review it and send your comments to this list.

Thanks and sorry about the slow review.

Some notes:

I think the paragraph

'The HTML5 parser will be able to parse the XML so in principle there is
no parsing problem except that the parser will build a DOM according to
the HTML5 parsing rules. There may be be issues with namespaces, but
this may "just work" for many scenarios even if the the result would not
meet the expectations for those who appreciate the full power of XML.'

gives the wrong idea of what happens. I think the foremost issue isn't
different namespaces but <foo/> getting parsed as a start tag most of
the time and name collisions with certain HTML elements causing
interesting effects.

I suggest removing the paragraph.

s/These rules will not always produce the same DOM that an XML parser
would have produced./There rules with most often produce a DOM that is
substantially different from the DOM that an XML parser would have

This paragraph is strange:
"There are still details of implementation to be considered in the case
where HTML5 is represented with well-formed XML. Is the markup to be
“clipped out” and handed to an HTML5 parser, or is the entire XML DOM
going to be handed to the HTML5 engine?"

If there's an XHTML5 subtree in a larger XML document, involving an HTML
parser would be the least preferred interface to an (X)HTML5 subsystem.
I think the most preferable would be passing a fragment of the app's
internal data model (e.g. an in-memory tree) to the subsystem. If the
subsystem interface wants a serialization and can ingest XHTML5, it
would make more sense to use XHTML5 than HTML5 at the boundary to avoid
the cases where some tree shapes don't round trip through the HTML

I think this paragraph steps outside the stated use case, since it
introduces a container outside the XML document:
"A third solution is to process the compound messages using MIME
multipart/related semantics, perhaps through facilities such as [MTOM]
or [XOP]. This is very much like the escaped markup case where
downstream processing must be sophisticated enough to reconstruct the
authors intent."
If this solution is to be mentioned, it would make sense to mention .zip
instead of the more esoteric archive formats.

This is incorrect:
"What the HTML5 parser produces when it processes this script element is
a script element node in the DOM which contains the escaped character
representation of the XML."
The text node content of the script node is not escaped. It is ready to
be used as input to an XML parser.

Historical side note: This technique has been documented in the /TR/
space since 1998!

> If it looks like we need to talk about any of them, I'll 
> probably schedule a telcon for 12 April.

I can't make it to the potential telecon tomorrow. My regrets.

Henri Sivonen

Received on Monday, 11 April 2011 12:52:34 UTC