Re: Use cases from Henri Sivonen on 2011-01-04 (public-html-xml@w3.org from January 2011)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 4 Jan 2011 12:55:45 +0200
To: public-html-xml@w3.org
Message-Id: <9FB67A83-5F9A-4539-9D9C-060C074969FC@iki.fi>
On Dec 30, 2010, at 23:19, Norman Walsh wrote:

> I think that we'll have trouble making any progress if we don't begin
> by understanding what the problem we're tasked to address is and
> finding a way to articulate it clearly and precisely.

Indeed.

> Near as I can tell, there are five possible use cases for HTML+XML:
> 
> 1. I have an XML toolchain and I want to consume HTML5 because I'd
>   like to process HTML5 using XML tools.

This indeed is something that multiple people seem to care about.

> In more constrained environments, it may be possible to arrange for
> all the HTML5 content to be authored in XHTML5 and then there's no
> parsing problem.

I agree.

> The wild and whooly HTML5 of the open internet is what it is. The only
> way to get that stuff is going to be to use an HTML5 parser. HTML5
> parsers that produce a stream of well-formed events suitable for
> constructing XML already exist, so this looks like a mostly solved
> parsing problem.

Indeed.

> The semantics of the HTML5 elements are described by the HTML5
> specification. It may be necessary/useful/convenient to shuffle
> namespaces a bit in the parsed content, for example to put SVG and
> MathML back in their respective namespaces so that your existing XML
> tools will do the right thing.

Assigning SVG elements into the http://www.w3.org/2000/svg namespace and assigning MathML elements into the http://www.w3.org/1998/Math/MathML namespace is part of what every conforming HTML5 parser does. Indeed, by definition, an element isn't an SVG element if it isn't in the http://www.w3.org/2000/svg namespace and an element isn't a MathML element if it isn't in the http://www.w3.org/1998/Math/MathML namespace.

In some cases, tags whose names look like the names of SVG or MathML elements may cause elements in the http://www.w3.org/1999/xhtml to be inserted into the tree by the HTML parsing algorithm. By definition, those elements aren't SVG or MathML elements but HTML elements.

> 2. I have an HTML5 toolchain and I want to consume XML because I'd
>   like to process XML using HTML5 tools.

Who actually wants this and why? What's an HTML5 toolchain (other than an XML toolchain with an HTML parser and an HTML serializer)?

Note that strictly speaking, this and the other "use cases" aren't use cases in the sense that "I want to use tool foo" isn't a use case. "I want to achieve result bar" may be a use case which might be addressed by using tool foo. (Unless, of course, using a pre-chosen tool is considered to be an end in itself rather than just a means.)

> The HTML5 parser will be able to parse the XML so there's no parsing
> problem.

This isn't true. I realize that the telecon minutes show me saying that this was so. My apologies for not expressing myself clearly enough on the telecon. I most certainly didn't mean to suggest using an HTML parser to process XML input. (I don't recall what my actual words were, but obviously what I said wasn't unambiguous enough.)

>  It'll build the DOM that the HTML5 spec says such a document
> represents.

This is true...

> There may be some namespace issues, but this should mostly
> "just work".


...but I wouldn't describe the result as something that "just works" for the usual expectations people have for XML.

> 3. I have an XML document and I want to embed islands of human prose
>   marked up with HTML5 in it because I want to be able to extract
>   those sections for use in, for example, documentation.
> 
> If you expect the document to remain well-formed XML, you'll have to
> author with XHTML5 and then there won't be any parsing problems.

Agreed.

> 4. I have an HTML5 document and I want to embed islands of XML in it
>   because I want to be able to write JavaScript and CSS to manipulate
>   those elements, for example, in the browser.

You can use the "data blocks" feature to transport an Unicode string inside a script element in text/html and have another parsing layer on top of the HTML parser that takes the text content of the script element and parses it with an XML parser.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 4 January 2011 10:58:13 UTC