- From: Robin Berjon <robin@berjon.com>
- Date: Tue, 12 Jun 2012 21:32:50 +0200
- To: Norman Walsh <ndw@nwalsh.com>
- Cc: W3C XML-ER Community Group <public-xml-er@w3.org>
On Jun 12, 2012, at 21:17 , Norman Walsh wrote: > David Carlisle <davidc@nag.co.uk> writes: >> On 29/02/2012 00:11, Noah Mendelsohn wrote: >>> I think the most important question is: how bad would be consequences >>> be if we guessed wrong. >> >> I still think that viewing things in that way leads to pain. If you look >> at the output of html5/xml5/Anne's-draft/ on my example (or any example >> really) there's no sense in which markup has been fixed. It is just >> parsed with a grammar that isn't xml and produces a tree in a >> deterministic fashion. The input was correct for that result tree. >> (Some inputs may be called parse error to make humans feel better but >> from a parsing point of view, that's a side issue). > > I'm inclined to agree with David on this point. I do, too. One thing I've been wondering about is whether there's a name describing a parsing algorithm that produces useful output for every single input (as opposed to one that blows up for a subset of possible inputs). I think that it might be useful in clarifying this discussion (plus, I'm sure it's a cool word). Alas, my computer science proficiency is pretty much limited to nodding sagely whenever someone says something like "that halting problem has O(n) complexity in the Turing machine" so I don't know where to look. > I think our goal should be deterministic parsing rules for building a > tree from a sequence of characters. For well-formed XML (at least > without an external subset), we should build the "right" tree. For > documents that look mostly like XML but have the occasional missing > quote or end tag, we should get the "right" tree for as many of those > as we can without making the spec impractical or having arguments > about the finer points of which result is correct. For everything > else, we should get a specific answer. +1 to all of that. -- Robin Berjon - http://berjon.com/ - @robinberjon
Received on Tuesday, 12 June 2012 19:33:18 UTC