- From: Anne van Kesteren <annevk@opera.com>
- Date: Sat, 24 Jan 2009 12:07:23 +0100
- To: "David Orchard" <orchard@pacificspirit.com>, "Henri Sivonen" <hsivonen@iki.fi>
- Cc: www-tag@w3.org
On Sat, 24 Jan 2009 05:17:32 +0100, David Orchard <orchard@pacificspirit.com> wrote: > That would be very interesting if we could actually create an XML5 > parser, I've done it (quite some time ago): http://code.google.com/p/xml5/ > and I'm in highly in favour of such a thing IFF it was used to allow XML > in HTML5. Parsing XML 1.0 documents to the correct infoset as well as parsing HTML to the infoset required by Web pages is impossible in the same parser. I suppose I should present proof for this though. Since I cannot think of a good way to put it, lets go through some examples. Stream: <table><input> Tree: html head body input table Stream: <table><input type="hidden"> Tree: html head body table input type="hidden" (<input type="hidden"> is a special case) Stream: <div><x></div><p> Tree: html head body div x p Stream: <div><button></div><p> Tree: html head body div button p (<button> is scoping) Stream: </br> Tree: html head body br Stream: <image/> Tree: html head body img Stream: x</p>x Tree: html head body "x" p "x" Hope that helps. HTML is a crazy format. You can try this out for yourself here: http://livedom.validator.nu/ http://james.html5.org/parsetree.html (Two independent implementations of the HTML5 parsing algorithm by the way. The first uses Java and the second Python.) > Absent such a thing, somebody would be forced to use an HTML5 > browser and then an API to extract the XML 1.0 infoset. It's slightly > more palatable with the HTML5 language spec being separate from all the > rest of the browser functions, but not as ideal as XML5. Organization of the specification has nothing to do with this. Since HTML syntax and language are intertwined you will never get the XML 1.0 infoset that the document actually represents. (It is also not clear to me why you would need an HTML5 browser, just an HTML5 parser should suffice.) -- Anne van Kesteren <http://annevankesteren.nl/> <http://www.opera.com/>
Received on Saturday, 24 January 2009 11:08:25 UTC