W3C home > Mailing lists > Public > public-qt-comments@w3.org > September 2005

[Bug 1850] [F&O] how do ranges work in case-insensitive mode?

From: <bugzilla@wiggum.w3.org>
Date: Wed, 14 Sep 2005 19:12:06 +0000
To: public-qt-comments@w3.org
Message-Id: <E1EFcfu-0000sN-V6@wiggum.w3.org>


------- Additional Comments From mike@saxonica.com  2005-09-14 19:12 -------
Use of the word "expand" was perhaps a bit careless. I only used it in examples,
and by saying "A expands to B" I was merely trying to find a shorter way of
saying "A with the i flag set matches the same set of strings as B without the i
flag set". It wasn't intended to describe an algorithm, let alone an
implementation (though I probably had one at the back of my mind).

I appreciate what you're trying to achieve, which I think I can paraphrase as
"if matches(S, P, "") is true, then matches(V(S), P, "i") is true if and only if
V(S) is a case-variant of S." However, I don't think your proposal achieves
this, and in fact I don't think it's a good idea anyway.

I think there are some problems with your proposal. It's not true that a
character range (charRange) is a character class (charClass), and it's not true
that a negative character group is a character class. It is true that "[^Q]" is
a charClass, but if we accept your rule 2, then I think the consequence is that
[^Q] matches every character: in the absence of the "i" flag it matches "q",
therefore in the presence of the "i" flag it also matches "Q". I think the
meaning [^qQ] is more intuitive, and that's why I decided to move the rule down
to the level of a charRange. 

It would be possible to define that a charClassEsc (such as \p{Lu}) matches
case-variants of its "normal" set of strings. The reason I didn't do this was
again to do with complements and subtraction. If you widen \p{Lu} to include
case-variants of its usual characters, do you retain the meaning that \P{Lu} is
the complement of \p{Lu} (in which case it matches a smaller set of characters
than it did before), or do you retain the meaning that it matches all the
characters it would normally match plus their case-variants (a larger set than
before)? I felt it was best to cop out here and say its meaning is unchanged. In
practice, I don't think this is a big problem, because most of the character
blocks already include case-variants of characters, and those that don't, like
Lu and Ll, exclude them very deliberately. 

Michael Kay
Received on Wednesday, 14 September 2005 19:12:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:57:08 UTC