- From: James Clark <jjc@jclark.com>
- Date: Sun, 9 Sep 2012 10:50:53 +0700
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: public-microxml@w3.org
- Message-ID: <CANz3_Ea4+y8Z7BUwCGrnAE-z97vX_4Zvv9u9pzb1gkE2S5xHTQ@mail.gmail.com>
Writing the production for char like this would, I think, make the logic
behind the definition clearer:
char ::= s | ([#x0-#x10FFFF] - forbiddenChar)
forbiddenChar ::= controlCodePoint | surrogateCodePoint |
nonCharacterCodePoint
controlCodePoint ::= [#x0-#1F] | [#x7F-#9F]
# The 66 noncharacters defined by Unicode
nonCharacterCodePoint ::= [#xFDD0-#xFDEF] | [#xFFFE-#xFFFF] |
[#x1FFFE-#x1FFFF]
| [#x2FFFE-#x2FFFF] | [#x3FFFE-#x3FFFF] |
[#x4FFFE-#x4FFFF]
| [#x5FFFE-#x5FFFF] | [#x6FFFE-#x6FFFF] |
[#x7FFFE-#x7FFFF]
| [#x8FFFE-#x8FFFF] | [#x9FFFE-#x9FFFF] |
[#xAFFFE-#xAFFFF]
| [#xBFFFE-#xBFFFF] | [#xCFFFE-#xCFFFF] |
[#xDFFFE-#xDFFFF]
| [#xEFFFE-#xEFFFF] | [#xFFFFE-#xFFFFF] |
[#x10FFFE-#x10FFFF]
The definition of nameStartChar also needs to exclude
nonCharacterCodePoints, eg by changing the last bit to
([#xF900-#xEFFFF] - nonCharacterCodePoint)
James
On Sun, Sep 9, 2012 at 3:45 AM, John Cowan <cowan@mercury.ccil.org> wrote:
> James Clark scripsit:
>
> > I would either leave the list out completely or put it in the syntax.
>
> Oh, in the syntax, absolutely.
>
> --
> Barry thirteen gules and argent on a canton azure John Cowan
> fifty mullets of five points of the second, cowan@ccil.org
> six, five, six, five, six, five, six, five, and six.
> --blazoning the U.S. flag http://www.ccil.org/~cowan
>
Received on Sunday, 9 September 2012 03:51:41 UTC