RE: error recovery from David Lee on 2012-02-18 (public-xml-er@w3.org from February 2012)

From: David Lee <David.Lee@marklogic.com>
Date: Sat, 18 Feb 2012 04:39:21 -0800
To: Norman Walsh <ndw@nwalsh.com>, W3C XML-ER Community Group <public-xml-er@w3.org>
Message-ID: <EB42045A1F00224E93B82E949EC6675E16ADAEE2D4@EXCHG-BE.marklogic.com>

I agree.
I would state it like this (vagely).  This is a pre-process step not unlike "tidy" which while may have definitive rules such that given the same sequence of chars, deterministically produces the same output (maybe?) (ideally a well  formed XML document), but that may not be reversible and there may have been many different input sequences which produced the same result.  And that the produced document need not have any markers or other information to say 'how did I get here'. 

It may also be possible that we produced rules where an implementation can choose one of many valid outputs.  So this is really an N:M problem.
Given {N} set of input document ->  F(N) -> one of {M} valid documents.


-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.

> -----Original Message-----
> From: Norman Walsh [mailto:ndw@nwalsh.com]
> Sent: Saturday, February 18, 2012 7:32 AM
> To: W3C XML-ER Community Group
> Subject: Re: error recovery
> 
> I'm coming around to the view expressed by Noah and David (and others)
> that we'd be better off casting this as a new set of parsing rules for
> interpreting some sequences of characters that resemble XML but are
> not well-formed in a way that deterministicly produces a tree.
> 
> I think when the process finishes, and we have a tree (if we have a
> tree), it will be possible (for a human) to look back and say, we got
> this tree by correcting these errors in these ways. But I'm not sure
> we should limit ourselves to describing the process in a way that
> guarantees that the XML-ER parser knows this.
> 
>                                         Be seeing you,
>                                           norm
> 
> --
> Norman Walsh
> Lead Engineer
> MarkLogic Corporation
> Phone: +1 413 624 6676
> www.marklogic.com

Received on Wednesday, 22 February 2012 12:55:19 UTC