W3C home > Mailing lists > Public > www-archive@w3.org > June 2009

Re: question about XML and HTML5

From: Anne van Kesteren <annevk@opera.com>
Date: Wed, 17 Jun 2009 13:51:45 +0200
To: "Jonathan Rees" <jar@creativecommons.org>
Cc: "Dan Connolly" <connolly@w3.org>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, www-archive@w3.org
Message-ID: <op.uvn28jgd64w2qv@annevk-t60>
On Wed, 17 Jun 2009 13:47:12 +0200, Jonathan Rees <jar@creativecommons.org> wrote:
> I don't see how your answer or the linked documents bear on my
> question, so let me amplify.
>
> The ideal situation:  you can take any HTML5 document, convert it to
> some XML-based language designed for the purpose (not necessarily
> XHTML), convert it back, and get a semantically equivalent HTML5
> document.

The parser of the HTML syntax is Turing-complete so that will not work. (You can inject characters into the tokenizer.)


> The problem I'm worried about is the lack of interoperability between
> HTML5 and XML processors. (It has nothing to do with browsers.) Other
> specs such as OWL 2 and XQuery have addressed this problem by
> providing XML syntax as an alternative. But this only achieves the
> intended effect if semantics-preserving round trips work.
>
> For comparison, 'tidy' provides conversion from HTML4 to XHTML (I
> think), and the resulting XHTML is in a subset (I think) of HTML4, so
> the round trip property holds. I assume this approach doesn't work for
> HTML5, which is why I do not necessarily have XHTML in mind as the
> representation.

If 'tidy' is good enough and you consider it working I do not see why that would not work for HTML5.


-- 
Anne van Kesteren
http://annevankesteren.nl/
Received on Wednesday, 17 June 2009 11:52:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:18:25 GMT