Re: Use cases from Benjamin Hawkes-Lewis on 2010-12-31 (public-html-xml@w3.org from December 2010)

From: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
Date: Fri, 31 Dec 2010 12:09:07 +0000
To: Norman Walsh <ndw@nwalsh.com>
Cc: public-html-xml@w3.org
Message-ID: <AANLkTinFzQ2UFxy5-=ocDgpjoQUZuAAC1yzRqf90iWpD@mail.gmail.com>
On Thu, Dec 30, 2010 at 9:19 PM, Norman Walsh <ndw@nwalsh.com> wrote:
> 1. I have an XML toolchain and I want to consume HTML5 because I'd
>   like to process HTML5 using XML tools.

[snip]

> HTML5
> parsers that produce a stream of well-formed events suitable for
> constructing XML already exist, so this looks like a mostly solved
> parsing problem.

[snip]

> It may be necessary/useful/convenient to shuffle
> namespaces a bit in the parsed content, for example to put SVG and
> MathML back in their respective namespaces so that your existing XML
> tools will do the right thing.

For conforming HTML5 markup, where does the text/html parsing
algorithm not already put SVG and MathML in their respective
namespaces?

> 2. I have an HTML5 toolchain and I want to consume XML because I'd
>   like to process XML using HTML5 tools.

Why would one want to do this?

The HTML5 parsing algorithm has been tuned for parsing text/html
content found in the wild, not arbitrary malformed XML like one finds
in feeds and publisher-side data exchange.

If you want to consume wild malformed XML, aren't you better off with
a XML5-like parser?

If you want to consume well-formed XML, aren't you better off with a
normal XML parser?

> A simpler subset of XML might be created to make life easier for the
> cases that would be covered by such a subset.

Unless the HTML5 algorithm is changed it would still end up in the
wrong namespace, no?

> 3. I have an XML document and I want to embed islands of human prose
>   marked up with HTML5 in it because I want to be able to extract
>   those sections for use in, for example, documentation.
>
> If you expect the document to remain well-formed XML, you'll have to
> author with XHTML5 and then there won't be any parsing problems.

Another option is to roundtrip HTML in CDATA like Atom feeds.

> 4. I have an HTML5 document and I want to embed islands of XML in it
>   because I want to be able to write JavaScript and CSS to manipulate
>   those elements, for example, in the browser.

Can you elaborate on this use case? What are we really talking about and why?

What are some example end-user problems this would solve? Might there
be other (better?) ways to solve them?

By "islands of XML" do we mean round-tripping information in XML for
clientside processing? Or do we mean a text/html document that
contains a mixture of HTML/MathML/SVG semantics and elements with
other arbitrary semantics?

There's a big difference between the two.

Round-tripping /information/ that could be expressed in XML can
already be done using RDFa or microdata annotations on top of
generically understood HTML/SVG/MathML semantics. Round-tripping a
blob of XML can be accomplished unescaped with the "script" element
with the single restriction that content cannot contain the string
"</script>" (case insensitive) or HTML escaped inside a data-*
attribute, param value attribute, or input type="hidden" value
attribute. (It's also done in the wild with comments, with the
restriction it cannot contain the string "--".)

On the other hand, including arbitrary markup inside a text/html
document would damage the RESTful architecture of the web because the
media type text/html could no longer be understood in terms of generic
semantics like "h1", "mtext", and "rect" (defined by HTML5, MathML,
and SVG respectively). Authors would presumably break separation of
concerns by trying to hack in functionality and presentation using CSS
and JS and (if users were lucky, which they usually aren't) patch up
accessibility with WAI-ARIA: making content and functionality less
robust (greater risk of intranet security prohibitions, network
failures, coding errors, varying levels of implementation support),
preventing users skinning content to suit their needs and preferences,
and forcing users to put themselves at risk by executing untrusted
code just to gain access to basic content and functionality. Sell me
on why a standards organization for committed to delivering end-users
an interoperable, accessible, skinnable, safe web experience would
want to support such usage.

--
Benjamin Hawkes-Lewis
Received on Friday, 31 December 2010 12:09:42 UTC