W3C home > Mailing lists > Public > public-microxml@w3.org > November 2012

Re: Error recovery

From: Uche Ogbuji <uche@ogbuji.net>
Date: Sat, 17 Nov 2012 18:00:34 -0700
Message-ID: <CAPJCua3QDbSmOcFWkQ5CuLhEDkiy-asfZtkXjiEmtJaH-ofnvw@mail.gmail.com>
To: John Cowan <cowan@mercury.ccil.org>
Cc: James Clark <jjc@jclark.com>, Michael Sokolov <sokolov@falutin.net>, liam@w3.org, "public-microxml (public-microxml@w3.org)" <public-microxml@w3.org>
On Sat, Nov 17, 2012 at 5:21 PM, John Cowan <cowan@mercury.ccil.org> wrote:

> James Clark scripsit:
> > For the purposes of recovery, I plan to use an extended definition of
> > nameStartChar:
> >
> > nameStartChar ::= [A-Za-z_:$] | [#x80-#x10FFFF]
> >
> > So the tree you get would be as if MicroXML allowed colons as a
> > nameStartChar.
> I think that's Just Wrong.  An error-correcting parser should produce
> a valid MicroXML data model, and the data model does not allow
> colons in names.  Something like Liam Quin's Ucode (invalid characters
> are changed into UnnnX, where nnn is the smallest possible number of
> hex digits to represent the Unicode scalara value) makes much more sense
> to me.  Alternatively, just map all disallowed characters to something
> legal but rarely used, like __.  Of course, U, X, and _ could be replaced
> by even more obscure but legal Unicode name characters.

I agree.  I would expect some sort of mapping transform as well.  It would
be good to develop a convention for such a thing, perhaps as a derivation
of Ucode?  Could that be a useful micro-deliverable for this group?  More
generally, I guess an overall error-correction convention could be (though
the--I say it again--insanity of what HTML5 came up with for such a
convention gives me some pause).

Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
Received on Sunday, 18 November 2012 01:00:56 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:12:12 UTC