[Bug 1850] [F&O] how do ranges work in case-insensitive mode?

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850





------- Additional Comments From holstege@mathling.com  2005-08-31 15:08 -------
I suspect empiricism here is telling us less about the language spec and more
about how carefully the implementors thought all the weird cases through.
It would be interesting to see if different JVMs are consistent here.

I think we can go back to first principles a bit: we say
"In case-insensitive mode, a character in the input string matches a character 
specified by the pattern if there is a default case mapping between the two 
characters as defined in section 3.13 of [The Unicode Standard]."

In the case of a character range, I would take "a character specified by the 
pattern" to be every character in that character range, so if there is a default 
case mapping between the input string character and any of them, its a match. 
Likewise for negative character ranges and so on.

That is, you don't mess with the pattern, you check the input string with case 
folding against the pattern as written. So I think (* = different from Java
reported results):
   matches("D", "[A-Z]", "i")  = true
   matches("d", "[A-Z]", "i")  = true
 * matches("D", "[A-Z-[D]]", "i")  = false
 * matches("d", "[A-Z-[D]]", "i")  = false

   matches("D", "[^d]", "i")  = false
   matches("d", "[^d]", "i")  = false

   matches("D", "\p{Lu}", "i") = true
 * matches("d", "\p{Lu}", "i") = true
   matches("D", "\P{Lu}", "i") = false
 * matches("d", "\P{Lu}", "i") = false

Received on Wednesday, 31 August 2005 15:08:43 UTC