David's less simple example (was: Marcos simple sample)

I think the simple example won't really distinguish systems that "fix
up" markup as they will all pretty much just close the stack of open
elements and give the same result.

To distinguish things a bit it's worth looking at something a bit less 
like well formed XML, say

<math><one<two<three</one><two></tree></math>

Using <math> as an outer element has the advantage that you can test
with an html5 parser (the <math> puts html5 in its "foreign content"
xml-like mode where /> means what it is supposed to mean. One desirable
property of XML-ER would be that it wasn't totally unlike the behaviour
of HTML5 on such content.

Using V.nu's parser you can see the result of parsing the above:

http://livedom.validator.nu/?%3C!DOCTYPE%20html%3E%0A%3Cmath%3E%3Cone%3Ctwo%3Cthree%3C%2Fone%3E%3Ctwo%3E%3C%2Ftree%3E%3C%2Fmath%3E

removing the html head and body implied in the html context results in a
parse tree of

<math><oneU00003CtwoU00003CthreeU00003C
one=""><two></two></oneU00003CtwoU00003CthreeU00003C></math>


which is what it is. I don't think it matters too much what the parse
tree is. That is, I don't think it's worth trying to argue about any
meaning implied by the original markup. The important thing is that
html5 specifies a deterministic algorithm that returns a tree. Unless
there is some overwhelming objection, I think XML-ER should return the
same tree. (To be honest I haven't checked what Anne's draft spec would
make of this yet).

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Received on Tuesday, 28 February 2012 15:49:39 UTC