- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 31 Aug 2005 15:08:36 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850 ------- Additional Comments From holstege@mathling.com 2005-08-31 15:08 ------- I suspect empiricism here is telling us less about the language spec and more about how carefully the implementors thought all the weird cases through. It would be interesting to see if different JVMs are consistent here. I think we can go back to first principles a bit: we say "In case-insensitive mode, a character in the input string matches a character specified by the pattern if there is a default case mapping between the two characters as defined in section 3.13 of [The Unicode Standard]." In the case of a character range, I would take "a character specified by the pattern" to be every character in that character range, so if there is a default case mapping between the input string character and any of them, its a match. Likewise for negative character ranges and so on. That is, you don't mess with the pattern, you check the input string with case folding against the pattern as written. So I think (* = different from Java reported results): matches("D", "[A-Z]", "i") = true matches("d", "[A-Z]", "i") = true * matches("D", "[A-Z-[D]]", "i") = false * matches("d", "[A-Z-[D]]", "i") = false matches("D", "[^d]", "i") = false matches("d", "[^d]", "i") = false matches("D", "\p{Lu}", "i") = true * matches("d", "\p{Lu}", "i") = true matches("D", "\P{Lu}", "i") = false * matches("d", "\P{Lu}", "i") = false
Received on Wednesday, 31 August 2005 15:08:43 UTC