Re: Error recovery

On Sat, Nov 17, 2012 at 5:21 PM, John Cowan <cowan@mercury.ccil.org> wrote:

> James Clark scripsit:
>
> > For the purposes of recovery, I plan to use an extended definition of
> > nameStartChar:
> >
> > nameStartChar ::= [A-Za-z_:$] | [#x80-#x10FFFF]
> >
> > So the tree you get would be as if MicroXML allowed colons as a
> > nameStartChar.
>
> I think that's Just Wrong.  An error-correcting parser should produce
> a valid MicroXML data model, and the data model does not allow
> colons in names.  Something like Liam Quin's Ucode (invalid characters
> are changed into UnnnX, where nnn is the smallest possible number of
> hex digits to represent the Unicode scalara value) makes much more sense
> to me.  Alternatively, just map all disallowed characters to something
> legal but rarely used, like __.  Of course, U, X, and _ could be replaced
> by even more obscure but legal Unicode name characters.
>

I agree.  I would expect some sort of mapping transform as well.  It would
be good to develop a convention for such a thing, perhaps as a derivation
of Ucode?  Could that be a useful micro-deliverable for this group?  More
generally, I guess an overall error-correction convention could be (though
the--I say it again--insanity of what HTML5 came up with for such a
convention gives me some pause).


-- 
Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
http://wearekin.org
http://www.thenervousbreakdown.com/author/uogbuji/
http://copia.ogbuji.net
http://www.linkedin.com/in/ucheogbuji
http://twitter.com/uogbuji

Received on Sunday, 18 November 2012 01:00:56 UTC