- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 31 Aug 2005 15:08:36 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850
------- Additional Comments From holstege@mathling.com 2005-08-31 15:08 -------
I suspect empiricism here is telling us less about the language spec and more
about how carefully the implementors thought all the weird cases through.
It would be interesting to see if different JVMs are consistent here.
I think we can go back to first principles a bit: we say
"In case-insensitive mode, a character in the input string matches a character
specified by the pattern if there is a default case mapping between the two
characters as defined in section 3.13 of [The Unicode Standard]."
In the case of a character range, I would take "a character specified by the
pattern" to be every character in that character range, so if there is a default
case mapping between the input string character and any of them, its a match.
Likewise for negative character ranges and so on.
That is, you don't mess with the pattern, you check the input string with case
folding against the pattern as written. So I think (* = different from Java
reported results):
matches("D", "[A-Z]", "i") = true
matches("d", "[A-Z]", "i") = true
* matches("D", "[A-Z-[D]]", "i") = false
* matches("d", "[A-Z-[D]]", "i") = false
matches("D", "[^d]", "i") = false
matches("d", "[^d]", "i") = false
matches("D", "\p{Lu}", "i") = true
* matches("d", "\p{Lu}", "i") = true
matches("D", "\P{Lu}", "i") = false
* matches("d", "\P{Lu}", "i") = false
Received on Wednesday, 31 August 2005 15:08:43 UTC