Re: Error recovery

James Clark scripsit:

> For the purposes of recovery, I plan to use an extended definition of
> nameStartChar:
> 
> nameStartChar ::= [A-Za-z_:$] | [#x80-#x10FFFF]
> 
> So the tree you get would be as if MicroXML allowed colons as a
> nameStartChar.

I think that's Just Wrong.  An error-correcting parser should produce
a valid MicroXML data model, and the data model does not allow
colons in names.  Something like Liam Quin's Ucode (invalid characters
are changed into UnnnX, where nnn is the smallest possible number of
hex digits to represent the Unicode scalara value) makes much more sense
to me.  Alternatively, just map all disallowed characters to something
legal but rarely used, like __.  Of course, U, X, and _ could be replaced
by even more obscure but legal Unicode name characters.

-- 
John Cowan  <cowan@ccil.org>  http://www.ccil.org/~cowan
        Raffiniert ist der Herrgott, aber boshaft ist er nicht.
                --Albert Einstein

Received on Sunday, 18 November 2012 00:21:31 UTC