Re: 12. Are C1 controls and Unicode non-characters disallowed? from James Clark on 2012-09-09 (public-microxml@w3.org from September 2012)

From: James Clark <jjc@jclark.com>
Date: Sun, 9 Sep 2012 10:50:53 +0700
To: John Cowan <cowan@mercury.ccil.org>
Cc: public-microxml@w3.org
Message-ID: <CANz3_Ea4+y8Z7BUwCGrnAE-z97vX_4Zvv9u9pzb1gkE2S5xHTQ@mail.gmail.com>

Writing the production for char like this would, I think, make the logic
behind the definition clearer:

char ::= s | ([#x0-#x10FFFF] - forbiddenChar)
forbiddenChar ::= controlCodePoint | surrogateCodePoint |
nonCharacterCodePoint
controlCodePoint ::= [#x0-#1F] | [#x7F-#9F]
# The 66 noncharacters defined by Unicode
nonCharacterCodePoint ::= [#xFDD0-#xFDEF] | [#xFFFE-#xFFFF] |
[#x1FFFE-#x1FFFF]
                     | [#x2FFFE-#x2FFFF] | [#x3FFFE-#x3FFFF] |
[#x4FFFE-#x4FFFF]
                     | [#x5FFFE-#x5FFFF] | [#x6FFFE-#x6FFFF] |
[#x7FFFE-#x7FFFF]
                     | [#x8FFFE-#x8FFFF] | [#x9FFFE-#x9FFFF] |
[#xAFFFE-#xAFFFF]
                     | [#xBFFFE-#xBFFFF] | [#xCFFFE-#xCFFFF] |
[#xDFFFE-#xDFFFF]
                     | [#xEFFFE-#xEFFFF] | [#xFFFFE-#xFFFFF] |
[#x10FFFE-#x10FFFF]

The definition of nameStartChar also needs to exclude
nonCharacterCodePoints, eg by changing the last bit to

([#xF900-#xEFFFF] - nonCharacterCodePoint)

James

On Sun, Sep 9, 2012 at 3:45 AM, John Cowan <cowan@mercury.ccil.org> wrote:

> James Clark scripsit:
>
> > I would either leave the list out completely or put it in the syntax.
>
> Oh, in the syntax, absolutely.
>
> --
> Barry thirteen gules and argent on a canton azure           John Cowan
> fifty mullets of five points of the second,             cowan@ccil.org
> six, five, six, five, six, five, six, five, and six.
>         --blazoning the U.S. flag           http://www.ccil.org/~cowan
>

Received on Sunday, 9 September 2012 03:51:41 UTC