- From: John Cowan <cowan@mercury.ccil.org>
- Date: Sat, 17 Nov 2012 19:21:07 -0500
- To: James Clark <jjc@jclark.com>
- Cc: Michael Sokolov <sokolov@falutin.net>, liam@w3.org, Uche Ogbuji <uche@ogbuji.net>, "public-microxml (public-microxml@w3.org)" <public-microxml@w3.org>
James Clark scripsit: > For the purposes of recovery, I plan to use an extended definition of > nameStartChar: > > nameStartChar ::= [A-Za-z_:$] | [#x80-#x10FFFF] > > So the tree you get would be as if MicroXML allowed colons as a > nameStartChar. I think that's Just Wrong. An error-correcting parser should produce a valid MicroXML data model, and the data model does not allow colons in names. Something like Liam Quin's Ucode (invalid characters are changed into UnnnX, where nnn is the smallest possible number of hex digits to represent the Unicode scalara value) makes much more sense to me. Alternatively, just map all disallowed characters to something legal but rarely used, like __. Of course, U, X, and _ could be replaced by even more obscure but legal Unicode name characters. -- John Cowan <cowan@ccil.org> http://www.ccil.org/~cowan Raffiniert ist der Herrgott, aber boshaft ist er nicht. --Albert Einstein
Received on Sunday, 18 November 2012 00:21:31 UTC