Re: David's less simple example from Robin Berjon on 2012-06-12 (public-xml-er@w3.org from June 2012)

From: Robin Berjon <robin@berjon.com>
Date: Tue, 12 Jun 2012 21:32:50 +0200
To: Norman Walsh <ndw@nwalsh.com>
Cc: W3C XML-ER Community Group <public-xml-er@w3.org>
Message-Id: <58BCA642-D304-4814-AF64-092E0376B44C@berjon.com>

On Jun 12, 2012, at 21:17 , Norman Walsh wrote:
> David Carlisle <davidc@nag.co.uk> writes:
>> On 29/02/2012 00:11, Noah Mendelsohn wrote:
>>> I think the most important question is: how bad would be consequences
>>> be if we guessed wrong.
>> 
>> I still think that viewing things in that way leads to pain. If you look
>> at the output of html5/xml5/Anne's-draft/ on my example (or any example
>> really) there's no sense in which markup has been fixed. It is just
>> parsed with a grammar that isn't xml and produces a tree in a
>> deterministic fashion. The input was correct for that result tree.
>> (Some inputs may be called parse error to make humans feel better but
>> from a parsing point of view, that's a side issue).
> 
> I'm inclined to agree with David on this point. 

I do, too. One thing I've been wondering about is whether there's a name describing a parsing algorithm that produces useful output for every single input (as opposed to one that blows up for a subset of possible inputs). I think that it might be useful in clarifying this discussion (plus, I'm sure it's a cool word). Alas, my computer science proficiency is pretty much limited to nodding sagely whenever someone says something like "that halting problem has O(n) complexity in the Turing machine" so I don't know where to look.

> I think our goal should be deterministic parsing rules for building a
> tree from a sequence of characters. For well-formed XML (at least
> without an external subset), we should build the "right" tree. For
> documents that look mostly like XML but have the occasional missing
> quote or end tag, we should get the "right" tree for as many of those
> as we can without making the spec impractical or having arguments
> about the finer points of which result is correct. For everything
> else, we should get a specific answer.

+1 to all of that.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Received on Tuesday, 12 June 2012 19:33:18 UTC