- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 14 Sep 2005 12:56:58 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850 ------- Additional Comments From mike@saxonica.com 2005-09-14 12:56 ------- (This is a short proposal, but it's the result of a lot of work - the waste bin is full of my failed attempts. It's packed with meaning and needs to be read very carefully, with a close eye on the syntax in Schema Part 2.) PROPOSAL The detailed rules for the effect of the "i" flag are as follows. In these rules, one character is considered to be a *case-variant* of another character if there is a default case mapping between the two characters as defined in section 3.13 of [The Unicode Standard]. Note that the case-variants of a character under this definition are always single characters. 1. When a normal character (Char) is used as an atom, it represents the set containing that character and all its case-variants. For example, the regular expression "z" expands to "[zZ]". 2. A character range (charRange) represents the set containing all the characters that it would match in the absence of the "i" flag, together with their case-variants. For example, "[A-Z]" expands to "[A-Za-z]". This rule applies also to a character range used in a character class subtraction (charClassSub): thus [A-Z-[IO]] expands to [A-Za-z-[IOio]]. It also applies to a character range used as part of a negative character group: thus [^Q] expands to [^Qq]. 3. A back-reference is compared using case-blind comparison: that is, each character must either be the same as the corresponding character of the previously matched string, or must be a case-variant of that character. For example, the strings "Mum", "mom", "Dad", and "DUD" all match the regular expression "([md])[aeiou]\1" when the "i" flag is used. 4. All other constructs are unaffected by the "i" flag. For example, "\p{Lu}" continues to match upper-case letters only. Michael Kay
Received on Wednesday, 14 September 2005 12:58:16 UTC