- From: Anne van Kesteren <annevk@opera.com>
- Date: Sat, 24 Jan 2009 12:07:23 +0100
- To: "David Orchard" <orchard@pacificspirit.com>, "Henri Sivonen" <hsivonen@iki.fi>
- Cc: www-tag@w3.org
On Sat, 24 Jan 2009 05:17:32 +0100, David Orchard
<orchard@pacificspirit.com> wrote:
> That would be very interesting if we could actually create an XML5
> parser,
I've done it (quite some time ago):
http://code.google.com/p/xml5/
> and I'm in highly in favour of such a thing IFF it was used to allow XML
> in HTML5.
Parsing XML 1.0 documents to the correct infoset as well as parsing HTML
to the infoset required by Web pages is impossible in the same parser.
I suppose I should present proof for this though. Since I cannot think of
a good way to put it, lets go through some examples.
Stream:
<table><input>
Tree:
html
head
body
input
table
Stream:
<table><input type="hidden">
Tree:
html
head
body
table
input type="hidden"
(<input type="hidden"> is a special case)
Stream:
<div><x></div><p>
Tree:
html
head
body
div
x
p
Stream:
<div><button></div><p>
Tree:
html
head
body
div
button
p
(<button> is scoping)
Stream:
</br>
Tree:
html
head
body
br
Stream:
<image/>
Tree:
html
head
body
img
Stream:
x</p>x
Tree:
html
head
body
"x"
p
"x"
Hope that helps. HTML is a crazy format.
You can try this out for yourself here:
http://livedom.validator.nu/
http://james.html5.org/parsetree.html
(Two independent implementations of the HTML5 parsing algorithm by the
way. The first uses Java and the second Python.)
> Absent such a thing, somebody would be forced to use an HTML5
> browser and then an API to extract the XML 1.0 infoset. It's slightly
> more palatable with the HTML5 language spec being separate from all the
> rest of the browser functions, but not as ideal as XML5.
Organization of the specification has nothing to do with this. Since HTML
syntax and language are intertwined you will never get the XML 1.0 infoset
that the document actually represents. (It is also not clear to me why you
would need an HTML5 browser, just an HTML5 parser should suffice.)
--
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>
Received on Saturday, 24 January 2009 11:08:25 UTC