On Sat, Nov 17, 2012 at 5:21 PM, John Cowan <cowan@mercury.ccil.org> wrote:
> James Clark scripsit:
>
> > For the purposes of recovery, I plan to use an extended definition of
> > nameStartChar:
> >
> > nameStartChar ::= [A-Za-z_:$] | [#x80-#x10FFFF]
> >
> > So the tree you get would be as if MicroXML allowed colons as a
> > nameStartChar.
>
> I think that's Just Wrong. An error-correcting parser should produce
> a valid MicroXML data model, and the data model does not allow
> colons in names. Something like Liam Quin's Ucode (invalid characters
> are changed into UnnnX, where nnn is the smallest possible number of
> hex digits to represent the Unicode scalara value) makes much more sense
> to me. Alternatively, just map all disallowed characters to something
> legal but rarely used, like __. Of course, U, X, and _ could be replaced
> by even more obscure but legal Unicode name characters.
>
I agree. I would expect some sort of mapping transform as well. It would
be good to develop a convention for such a thing, perhaps as a derivation
of Ucode? Could that be a useful micro-deliverable for this group? More
generally, I guess an overall error-correction convention could be (though
the--I say it again--insanity of what HTML5 came up with for such a
convention gives me some pause).
--
Uche Ogbuji http://uche.ogbuji.net
Founding Partner, Zepheira http://zepheira.com
http://wearekin.org
http://www.thenervousbreakdown.com/author/uogbuji/
http://copia.ogbuji.net
http://www.linkedin.com/in/ucheogbuji
http://twitter.com/uogbuji