[Bug 1850] [F&O] how do ranges work in case-insensitive mode?

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850





------- Additional Comments From mike@saxonica.com  2005-08-17 13:11 -------
Excellent point.

I think the rule that works best is to expand the range, e.g. [a-h] becomes
[abcdefgh], and then match this with the "i" flag, applying the existing rule in
the spec "a character in the input string matches a character specified by the
pattern if there is a default case mapping between the two characters as defined
in section 3.13 of [The Unicode Standard]." (Is this the same as your first
suggestion?)

As far as I can tell by experiment, this seems to be the way it works in Java
(which is modelled on Perl). 

I'm having a bit more trouble divining the semantics for subtractions and
negative groups: at present in Saxon

  matches('G','[A-Z-[f-h]]','i')
and
  matches('G','[A-Z-[F-H]]','i')

both return true, which is a little surprising, while

  matches('G','[A-Z-[F-Hf-h]]','i')

returns false. And

  matches('G','[^G]','i')  = false
while
  matches('G','[^F-H]',i') = true

I need to do a bit more investigation to see whether it's Java that's behaving
this way, or whether its a consequence of the way I translate XPath regex to
Java regex syntax (I use James Clark's code for this, modified to handle the
XPath extensions to Schema regex syntax).

If anyone can do some experiments with Perl, that would be useful...

Michael Kay

Received on Wednesday, 17 August 2005 13:11:40 UTC