- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 14 Sep 2005 19:41:05 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850 ------- Additional Comments From holstege@mathling.com 2005-09-14 19:41 ------- If we rephrase "expands" I'm happier with your proposal, even if we touch nothing else, although I'd still prefer to state some general rule rather than take it by cases, but I could live without doing so. > I think there are some problems with your proposal. It's not true that a > character range (charRange) is a character class (charClass), and it's not true > that a negative character group is a character class. Uh, yes it is. It do say in XML Schema part 2: [11] charClass ::= charClassEsc | charClassExpr | WildcardEsc [12] charClassExpr ::= '[' charGroup ']' [13] charGroup ::= posCharGroup | negCharGroup | charClassSub [23] charClassEsc ::= ( SingleCharEsc | MultiCharEsc | catEsc | complEsc ) I can fill in the posCharGroup and negCharGroup and so on, but I think you get the idea. Everything is a charClass. I see your point with \p{Lu} and \P{Lu}; let's think about that a bit out loud to see where we get: Let just say for abbreviation's sake that normally \p{Lu} denotes the set {"A","B"}. \P{Lu} = [^\p{Lu}] so sayeth Datatypes, so this includes a set of lots and lots of single-character strings, including "a" and "b". If instead of using the handy abbreviation \p{Lu} we had spelled it out: [AB], denoting the set {"A","B"} and the complement would be [^AB], denoting a set containing lots and lots of single-character strings, including "a" and "b", so this is all consistent. Under the rules of the "i" flag, if we say \p{Lu} means what it means with other character classes, it denotes the set {"A", "B", "a", "b"}. Following the equation from Datatypes we get that \P{Lu} denotes a set with lots and lots of characters but not "a" or "b". If we had written out \p{Lu} as [AB] that would also have denoted the set {"A","B","a","b"} and the complement [^AB] would have also denoted the set with lots and lots of characters but not "a" or "b". So again, this is entirely consistent. Suppose, however, that under the rules of the "i" flag, we leave \p{Lu} and \P{Lu} alone. The \p{Lu} denotes the set {"A","B"}, and \P{Lu} denotes the set with lots and lots of single character strings including "a" and "b". If, not knowing this handy abbreviation, I had written out \p{Lu} as [AB], I will denote a different set under the "i" flag: {"A","B","a","b"}. Likewise [^AB] will denote a set that does not include "a" and "b". I find this inconsistency pretty baffling to explain, and having to special case here makes implementation harder. So I think we should apply the rule consistently across all character classes.
Received on Wednesday, 14 September 2005 19:41:29 UTC