W3C home > Mailing lists > Public > www-archive@w3.org > June 2009

Re: question about XML and HTML5

From: Jonathan Rees <jar@creativecommons.org>
Date: Wed, 17 Jun 2009 08:54:54 -0400
Message-ID: <760bcb2a0906170554g14bb4795ja490530fd18f993a@mail.gmail.com>
To: Anne van Kesteren <annevk@opera.com>
Cc: Dan Connolly <connolly@w3.org>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, www-archive@w3.org
On Wed, Jun 17, 2009 at 7:51 AM, Anne van Kesteren<annevk@opera.com> wrote:
> On Wed, 17 Jun 2009 13:47:12 +0200, Jonathan Rees <jar@creativecommons.org> wrote:
>> I don't see how your answer or the linked documents bear on my
>> question, so let me amplify.
>>
>> The ideal situation:  you can take any HTML5 document, convert it to
>> some XML-based language designed for the purpose (not necessarily
>> XHTML), convert it back, and get a semantically equivalent HTML5
>> document.
>
> The parser of the HTML syntax is Turing-complete so that will not work. (You can inject characters into the tokenizer.)

COBOL is also Turing-complete, so I guess I could use that.

>
>> The problem I'm worried about is the lack of interoperability between
>> HTML5 and XML processors. (It has nothing to do with browsers.) Other
>> specs such as OWL 2 and XQuery have addressed this problem by
>> providing XML syntax as an alternative. But this only achieves the
>> intended effect if semantics-preserving round trips work.
>>
>> For comparison, 'tidy' provides conversion from HTML4 to XHTML (I
>> think), and the resulting XHTML is in a subset (I think) of HTML4, so
>> the round trip property holds. I assume this approach doesn't work for
>> HTML5, which is why I do not necessarily have XHTML in mind as the
>> representation.
>
> If 'tidy' is good enough and you consider it working I do not see why that would not work for HTML5.

Because HTML5 is so different from HTML4, I have no reason to think it
would work. I'm not even sure tidy works for HTML4. And it is not as
well specified as OWL/XML or XQuery/XML far as I know.

The spirit of my question was not combative, but rather a request to
some people I trust to supply me with reliable information. I think
they understand the background of my question and will probably
understand where I am going with this.

The www-archive list is described as follows: "Miscellaneous.
Mail-to-web gateway."  I was using it in the latter capacity, as I
have seen others do. Sorry if my message was construed otherwise. If
you are interested in pursuing this I think the discussion should be
moved elsewhere.

Jonathan
Received on Wednesday, 17 June 2009 12:55:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:18:25 GMT