W3C home > Mailing lists > Public > public-microxml@w3.org > November 2012

Re: Error recovery

From: John Cowan <cowan@mercury.ccil.org>
Date: Sat, 17 Nov 2012 19:21:07 -0500
To: James Clark <jjc@jclark.com>
Cc: Michael Sokolov <sokolov@falutin.net>, liam@w3.org, Uche Ogbuji <uche@ogbuji.net>, "public-microxml (public-microxml@w3.org)" <public-microxml@w3.org>
Message-ID: <20121118002107.GA19653@mercury.ccil.org>
James Clark scripsit:

> For the purposes of recovery, I plan to use an extended definition of
> nameStartChar:
> 
> nameStartChar ::= [A-Za-z_:$] | [#x80-#x10FFFF]
> 
> So the tree you get would be as if MicroXML allowed colons as a
> nameStartChar.

I think that's Just Wrong.  An error-correcting parser should produce
a valid MicroXML data model, and the data model does not allow
colons in names.  Something like Liam Quin's Ucode (invalid characters
are changed into UnnnX, where nnn is the smallest possible number of
hex digits to represent the Unicode scalara value) makes much more sense
to me.  Alternatively, just map all disallowed characters to something
legal but rarely used, like __.  Of course, U, X, and _ could be replaced
by even more obscure but legal Unicode name characters.

-- 
John Cowan  <cowan@ccil.org>  http://www.ccil.org/~cowan
        Raffiniert ist der Herrgott, aber boshaft ist er nicht.
                --Albert Einstein
Received on Sunday, 18 November 2012 00:21:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 18 November 2012 00:21:32 GMT