W3C home > Mailing lists > Public > www-archive@w3.org > June 2009

Re: question about XML and HTML5

From: Jonathan Rees <jar@creativecommons.org>
Date: Wed, 17 Jun 2009 07:47:12 -0400
Message-ID: <760bcb2a0906170447g415e54abk82439d8c6874d615@mail.gmail.com>
To: Anne van Kesteren <annevk@opera.com>
Cc: Dan Connolly <connolly@w3.org>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, www-archive@w3.org
I don't see how your answer or the linked documents bear on my
question, so let me amplify.

The ideal situation:  you can take any HTML5 document, convert it to
some XML-based language designed for the purpose (not necessarily
XHTML), convert it back, and get a semantically equivalent HTML5

The problem I'm worried about is the lack of interoperability between
HTML5 and XML processors. (It has nothing to do with browsers.) Other
specs such as OWL 2 and XQuery have addressed this problem by
providing XML syntax as an alternative. But this only achieves the
intended effect if semantics-preserving round trips work.

For comparison, 'tidy' provides conversion from HTML4 to XHTML (I
think), and the resulting XHTML is in a subset (I think) of HTML4, so
the round trip property holds. I assume this approach doesn't work for
HTML5, which is why I do not necessarily have XHTML in mind as the


On Wed, Jun 17, 2009 at 6:57 AM, Anne van Kesteren<annevk@opera.com> wrote:
> On Wed, 17 Jun 2009 12:51:05 +0200, Jonathan Rees <jar@creativecommons.org> wrote:
>> This question sounds so stupid that I didn't want to ask it in public.
>> Many web-related languages that have idiosyncratic syntax also provide
>> an XML surface syntax. Examples are Turtle (RDF/XML), xquery, OWL 2
>> (OWL/XML). To ensure that HTML5 can participate in XML pipelines in a
>> standard way, wouldn't it be a good idea to have a standard XML
>> surface syntax for HTML5, with semantics preserved over round trips?
>> Perhaps this even could be done using a set of extensions to XHTML.
> http://www.whatwg.org/specs/web-apps/current-work/multipage/the-xhtml-syntax.html
> Already works fine in modern browsers for new elements such as <canvas>, <video>, etc.
> There is also the following section for how an HTML byte stream maps to an infoset
> http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#coercing-an-html-dom-into-an-infoset
> which I believe is implemented by the Validator.nu software.
> --
> Anne van Kesteren
> http://annevankesteren.nl/
Received on Wednesday, 17 June 2009 11:47:56 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:43:33 UTC