- From: Anne van Kesteren <annevk@opera.com>
- Date: Wed, 17 Jun 2009 15:05:29 +0200
- To: "Jonathan Rees" <jar@creativecommons.org>
- Cc: "Dan Connolly" <connolly@w3.org>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, www-archive@w3.org
On Wed, 17 Jun 2009 14:54:54 +0200, Jonathan Rees <jar@creativecommons.org> wrote: > On Wed, Jun 17, 2009 at 7:51 AM, Anne van Kesteren<annevk@opera.com> > wrote: >> On Wed, 17 Jun 2009 13:47:12 +0200, Jonathan Rees >> <jar@creativecommons.org> wrote: >>> I don't see how your answer or the linked documents bear on my >>> question, so let me amplify. >>> >>> The ideal situation: you can take any HTML5 document, convert it to >>> some XML-based language designed for the purpose (not necessarily >>> XHTML), convert it back, and get a semantically equivalent HTML5 >>> document. >> >> The parser of the HTML syntax is Turing-complete so that will not work. >> (You can inject characters into the tokenizer.) > > COBOL is also Turing-complete, so I guess I could use that. That does not give you XML though :-) On IRC it was suggested you could wrap the HTML5 document inside a big CDATA wrapper which would theoretically do what you want, but would probably not be very useful. If you ignore document.write() doing what is suggested in the links I provided earlier (especially how to map an HTML byte stream to an XML DOM) will get you quite close. I suppose you could also ignore script execution altogether and together with creating an infoset out of an HTML byte stream you might be able to get pretty far too, but I haven't thought about that in detail. >> If 'tidy' is good enough and you consider it working I do not see why >> that would not work for HTML5. > > Because HTML5 is so different from HTML4, I have no reason to think it > would work. I'm not even sure tidy works for HTML4. And it is not as > well specified as OWL/XML or XQuery/XML far as I know. I thought 'tidy' dealt with "tag soup" input and tried to make something out of it. In that respect I would not classify HTML5 as "so different" from HTML4 :-) I do agree that 'tidy' is not well specified, but HTML5 is and has a way to get to XML and back. (And this is implemented as well.) > The spirit of my question was not combative, but rather a request to > some people I trust to supply me with reliable information. I think > they understand the background of my question and will probably > understand where I am going with this. > > The www-archive list is described as follows: "Miscellaneous. > Mail-to-web gateway." I was using it in the latter capacity, as I > have seen others do. Sorry if my message was construed otherwise. If > you are interested in pursuing this I think the discussion should be > moved elsewhere. I was just interested in trying to help you out. Due to lack of context I probably misunderstood what you wanted (or maybe not :-). -- Anne van Kesteren http://annevankesteren.nl/
Received on Wednesday, 17 June 2009 13:06:12 UTC