- From: Jonathan Rees <jar@creativecommons.org>
- Date: Wed, 17 Jun 2009 08:54:54 -0400
- To: Anne van Kesteren <annevk@opera.com>
- Cc: Dan Connolly <connolly@w3.org>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, www-archive@w3.org
On Wed, Jun 17, 2009 at 7:51 AM, Anne van Kesteren<annevk@opera.com> wrote: > On Wed, 17 Jun 2009 13:47:12 +0200, Jonathan Rees <jar@creativecommons.org> wrote: >> I don't see how your answer or the linked documents bear on my >> question, so let me amplify. >> >> The ideal situation: you can take any HTML5 document, convert it to >> some XML-based language designed for the purpose (not necessarily >> XHTML), convert it back, and get a semantically equivalent HTML5 >> document. > > The parser of the HTML syntax is Turing-complete so that will not work. (You can inject characters into the tokenizer.) COBOL is also Turing-complete, so I guess I could use that. > >> The problem I'm worried about is the lack of interoperability between >> HTML5 and XML processors. (It has nothing to do with browsers.) Other >> specs such as OWL 2 and XQuery have addressed this problem by >> providing XML syntax as an alternative. But this only achieves the >> intended effect if semantics-preserving round trips work. >> >> For comparison, 'tidy' provides conversion from HTML4 to XHTML (I >> think), and the resulting XHTML is in a subset (I think) of HTML4, so >> the round trip property holds. I assume this approach doesn't work for >> HTML5, which is why I do not necessarily have XHTML in mind as the >> representation. > > If 'tidy' is good enough and you consider it working I do not see why that would not work for HTML5. Because HTML5 is so different from HTML4, I have no reason to think it would work. I'm not even sure tidy works for HTML4. And it is not as well specified as OWL/XML or XQuery/XML far as I know. The spirit of my question was not combative, but rather a request to some people I trust to supply me with reliable information. I think they understand the background of my question and will probably understand where I am going with this. The www-archive list is described as follows: "Miscellaneous. Mail-to-web gateway." I was using it in the latter capacity, as I have seen others do. Sorry if my message was construed otherwise. If you are interested in pursuing this I think the discussion should be moved elsewhere. Jonathan
Received on Wednesday, 17 June 2009 12:55:32 UTC