- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 14 Sep 2005 19:12:06 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850 ------- Additional Comments From mike@saxonica.com 2005-09-14 19:12 ------- Use of the word "expand" was perhaps a bit careless. I only used it in examples, and by saying "A expands to B" I was merely trying to find a shorter way of saying "A with the i flag set matches the same set of strings as B without the i flag set". It wasn't intended to describe an algorithm, let alone an implementation (though I probably had one at the back of my mind). I appreciate what you're trying to achieve, which I think I can paraphrase as "if matches(S, P, "") is true, then matches(V(S), P, "i") is true if and only if V(S) is a case-variant of S." However, I don't think your proposal achieves this, and in fact I don't think it's a good idea anyway. I think there are some problems with your proposal. It's not true that a character range (charRange) is a character class (charClass), and it's not true that a negative character group is a character class. It is true that "[^Q]" is a charClass, but if we accept your rule 2, then I think the consequence is that [^Q] matches every character: in the absence of the "i" flag it matches "q", therefore in the presence of the "i" flag it also matches "Q". I think the meaning [^qQ] is more intuitive, and that's why I decided to move the rule down to the level of a charRange. It would be possible to define that a charClassEsc (such as \p{Lu}) matches case-variants of its "normal" set of strings. The reason I didn't do this was again to do with complements and subtraction. If you widen \p{Lu} to include case-variants of its usual characters, do you retain the meaning that \P{Lu} is the complement of \p{Lu} (in which case it matches a smaller set of characters than it did before), or do you retain the meaning that it matches all the characters it would normally match plus their case-variants (a larger set than before)? I felt it was best to cop out here and say its meaning is unchanged. In practice, I don't think this is a big problem, because most of the character blocks already include case-variants of characters, and those that don't, like Lu and Ll, exclude them very deliberately. Michael Kay
Received on Wednesday, 14 September 2005 19:12:15 UTC