Re: error recovery from Noah Mendelsohn on 2012-02-18 (public-xml-er@w3.org from February 2012)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Sat, 18 Feb 2012 12:36:42 -0500
To: Norman Walsh <ndw@nwalsh.com>
CC: W3C XML-ER Community Group <public-xml-er@w3.org>
Message-ID: <4F3FE1AA.7010507@arcanedomain.com>

On 2/18/2012 7:32 AM, Norman Walsh wrote:
> I'm coming around to the view expressed by Noah and David (and others)
> that we'd be better off casting this as a new set of parsing rules for
> interpreting some sequences of characters that resemble XML but are
> not well-formed in a way that deterministicly produces a tree.

> I think when the process finishes, and we have a tree (if we have a
> tree), it will be possible (for a human) to look back and say, we got
> this tree by correcting these errors in these ways.

Yes, I think that's generally where the focus should be. As I said in my 
earlier note, I think it's worth giving a bit of thought to whether it will 
be easy or hard to put reasonably tight bounds on identifying the subtrees 
that correspond to non-wellformed input. I also think we should demonstrate 
that the mapping can be implemented efficiently in a streaming processor 
for those who need streaming (though, in certain cases, there may be a 
tradeoff between streamability and the care taken in mapping non-wellformed 
input, as doing the latter well might involve backtracking).

I don't think we should standardize the APIs that expose either the tree or 
error identifications, and I don't think we should the characteristics 
processors themselves.

Noah

Received on Saturday, 18 February 2012 17:37:06 UTC