W3C home > Mailing lists > Public > public-xml-schema-testsuite@w3.org > November 2006

Questionable tests using "-" in character ranges

From: Michael Kay <mike@saxonica.com>
Date: Thu, 16 Nov 2006 10:24:20 -0000
To: <public-xml-schema-testsuite@w3.org>
Message-ID: <006001c70969$612e9f90$6401a8c0@turtle>

The following MS regex tests are questionable:

   <test group="reF20" name="reF20"/>
   <test group="reF21" name="reF21"/>
   <test group="reF22" name="reF22"/>
   <test group="reF23" name="reF23"/>

They use constructs such as 

[^a-d-b-c]

It's not clear whether this is legal or not. This is the subject of a
long-open bug report on the spec. The spec says:

([17]) charRange	   ::=   	 seRange | XmlCharIncDash

(bullet 1) The [, ], - and \ characters are not valid character ranges;

(bullet 3) The - character is a valid character range only at the beginning
or end of a .positive character group.. 

(Note:) The grammar for .character range. as given above is ambiguous, but
the second and third bullets above together remove the ambiguity.

Clearly these statements are mutually contradictory and implementors can
interpret them in different ways. The grammar says that "-" is allowed;
bullet 1 says it isn't; bullet 3 simply muddies the waters, and the Note
implies that bullets 2 and 3 override the grammar but bullet 1 doesn't. What
a mess.

Michael Kay
Received on Thursday, 16 November 2006 10:30:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:55 GMT