[Bug 1850] [F&O] how do ranges work in case-insensitive mode? from bugzilla@wiggum.w3.org on 2005-09-27 (public-qt-comments@w3.org from September 2005)

From: <bugzilla@wiggum.w3.org>
Date: Tue, 27 Sep 2005 15:48:54 +0000
To: public-qt-comments@w3.org
Cc:
Message-Id: <E1EKHhO-00080i-2Z@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850


ashok.malhotra@oracle.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From ashok.malhotra@oracle.com  2005-09-27 15:48 -------
The WGs decided on 9/27 to accept Michael Kay's proposal in comment #16.  See below.

The detailed rules for the effect of the "i" flag are as follows. In these
rules, one character C2 is considered to be a *case-variant* of another
character C1 if the following XPath expression returns true, when the two
characters are considered as strings of length one, and the Unicode codepoint
collation is used:

fn:lower-case(C1) eq fn:lower-case(C2) 
  or 
fn:upper-case(C1) eq fn:upper-case(C2)

Note that the case-variants of a character under this definition are always
single characters.

1. When a normal character (Char) is used as an atom, it represents the set
containing that character and all its case-variants. For example, the regular
expression "z" will match both "z" and "Z".

2. A character range (charRange) represents the set containing all the
characters that it would match in the absence of the "i" flag, together with
their case-variants. For example, the regular expression "[A-Z]" will match all
the letters A-Z and all the letters a-z. It will also match certain other
characters such as x212A (KELVIN SIGN), since fn:lower-case("&#x212A") is "k". 

This rule applies also to a character range used in a character class
subtraction (charClassSub): thus [A-Z-[IO]] will match characters such as "A",
"B", "a", and "b", but will not match "I", "O", "i", or "o". 

The rule also applies to a character range used as part of a negative character
group: thus [^Q] will match every character except "Q" and "q" (these being the
only case-variants of "Q" in Unicode).

3. A back-reference is compared using case-blind comparison: that is, each
character must either be the same as the corresponding character of the
previously matched string, or must be a case-variant of that character. For
example, the strings "Mum", "mom", "Dad", and "DUD" all match the regular
expression "([md])[aeiou]\1" when the "i" flag is used.

4. All other constructs are unaffected by the "i" flag. For example, "\p{Lu}"
continues to match upper-case letters only.

Received on Tuesday, 27 September 2005 15:52:27 UTC