- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Tue, 28 Feb 2012 17:57:12 +0000
- To: "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
- Cc: David Carlisle <davidc@nag.co.uk>
On 28 Feb 2012, at 15:49, David Carlisle wrote: > To distinguish things a bit it's worth looking at something a bit less like well formed XML, say > > <math><one<two<three</one><two></tree></math> > > Using <math> as an outer element has the advantage that you can test > with an html5 parser (the <math> puts html5 in its "foreign content" > xml-like mode where /> means what it is supposed to mean. One desirable > property of XML-ER would be that it wasn't totally unlike the behaviour > of HTML5 on such content. > > Using V.nu's parser you can see the result of parsing the above: > > http://livedom.validator.nu/?%3C!DOCTYPE%20html%3E%0A%3Cmath%3E%3Cone%3Ctwo%3Cthree%3C%2Fone%3E%3Ctwo%3E%3C%2Ftree%3E%3C%2Fmath%3E > > removing the html head and body implied in the html context results in a > parse tree of > > <math><oneU00003CtwoU00003CthreeU00003C > one=""><two></two></oneU00003CtwoU00003CthreeU00003C></math> > > which is what it is. I don't think it matters too much what the parse > tree is. That is, I don't think it's worth trying to argue about any > meaning implied by the original markup. The important thing is that > html5 specifies a deterministic algorithm that returns a tree. Unless > there is some overwhelming objection, I think XML-ER should return the > same tree. (To be honest I haven't checked what Anne's draft spec would > make of this yet). Although I agree that the important thing is a deterministic algorithm that produces a tree, I think it *is* worth arguing about meaning implied by the original markup -- or at least how a person might have got to this XML from some well-formed XML -- specifically to address the editor use case for XML-ER, as George highlights later in this thread. To take a slightly less degenerate case, if someone started with the well-formed: <math><three /></math> and then started typing a new tag before the <three> empty element: <math><two<three /></math> I think it is much much more reasonable for this to be interpreted in an editor as the tree + math + two + three (with the <two> element flagged as having an error) then it is to be interpreted as the tree + math + twoU00003Cthree (with the <twoU00003Cthree> flagged as having an error). While I agree that it's useful to be consistent with HTML5 parsing, I don't think we should be overly slavish. Browsers already have an HTML5-specified parsing algorithm that can be applied to XML, but because it's HTML5-aware, it doesn't meet our first requirement which is to be compatible with XML. Given that we're going to be asking browsers to implement a different algorithm anyway, I don't see that the benefits from being consistent with HTML5 are so massive that they outweigh the benefits of having a single algorithm that is usable in the editing and ingesting environments as well as in browsers. Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Received on Tuesday, 28 February 2012 17:57:37 UTC