- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 14 Sep 2005 22:07:26 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850 ------- Additional Comments From mike@saxonica.com 2005-09-14 22:07 ------- Response to Mary: I said: * it's not true that a negative character group is a character class. You said: Uh, yes it is. It do say in XML Schema part 2: [11] charClass ::= charClassEsc | charClassExpr | WildcardEsc [12] charClassExpr ::= '[' charGroup ']' [13] charGroup ::= posCharGroup | negCharGroup | charClassSub [23] charClassEsc ::= ( SingleCharEsc | MultiCharEsc | catEsc | complEsc ) I can fill in the posCharGroup and negCharGroup and so on, but I think you get the idea. Everything is a charClass. I say: oh no it isn't! A negative character group is a charGroup, and a charGroup *enclosed in square brackets* is a charClass. But a negative character group on its own, without the square brackets, is not a charClass. As regards \P{Lu}, you can maintain either one of two invariants (a) \P(Lu) == [^\p{Lu}] (b) if matches("X", P, "") then matches("x", P, "i") for any regex P but you can't maintain both. I think your logic is flawed here: "If we had written out \p{Lu} as [AB] that would also have denoted the set {"A","B","a","b"} and the complement [^AB] would have also denoted the set with lots and lots of characters but not "a" or "b". So again, this is entirely consistent." You're relying here on [^AB] meaning [^ABab]. But under your proposal that's not what it means. Under your proposal [^AB] matches every character. [^AB] is a charClass, therefore rule 2 applies, which says A character class C denotes a set of strings that contains one single-character string "x" for each character x that is either in the class or is a case-variant of some character in the class. If I'm reading that correctly (perhaps I'm not?) you're saying "a" is in the class [^AB], therefore "A" is also in the class [^AB]. In my proposal I'm breaking invariant (b): I'm saying that [^AB] is a *smaller* set of characters under the "i" flag than in the absence of the "i" flag. I think that's the right thing to do. Having already broken that invariant, I'm then retaining invariant (a) with my proposed treatment of charClassEsc. Michael Kay
Received on Wednesday, 14 September 2005 22:07:38 UTC