Questionable tests using "-" in character ranges

The following MS regex tests are questionable:

   <test group="reF20" name="reF20"/>
   <test group="reF21" name="reF21"/>
   <test group="reF22" name="reF22"/>
   <test group="reF23" name="reF23"/>

They use constructs such as 

[^a-d-b-c]

It's not clear whether this is legal or not. This is the subject of a
long-open bug report on the spec. The spec says:

([17]) charRange	   ::=   	 seRange | XmlCharIncDash

(bullet 1) The [, ], - and \ characters are not valid character ranges;

(bullet 3) The - character is a valid character range only at the beginning
or end of a .positive character group.. 

(Note:) The grammar for .character range. as given above is ambiguous, but
the second and third bullets above together remove the ambiguity.

Clearly these statements are mutually contradictory and implementors can
interpret them in different ways. The grammar says that "-" is allowed;
bullet 1 says it isn't; bullet 3 simply muddies the waters, and the Note
implies that bullets 2 and 3 override the grammar but bullet 1 doesn't. What
a mess.

Michael Kay

Received on Thursday, 16 November 2006 10:30:38 UTC