- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 17 Aug 2005 13:11:33 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850 ------- Additional Comments From mike@saxonica.com 2005-08-17 13:11 ------- Excellent point. I think the rule that works best is to expand the range, e.g. [a-h] becomes [abcdefgh], and then match this with the "i" flag, applying the existing rule in the spec "a character in the input string matches a character specified by the pattern if there is a default case mapping between the two characters as defined in section 3.13 of [The Unicode Standard]." (Is this the same as your first suggestion?) As far as I can tell by experiment, this seems to be the way it works in Java (which is modelled on Perl). I'm having a bit more trouble divining the semantics for subtractions and negative groups: at present in Saxon matches('G','[A-Z-[f-h]]','i') and matches('G','[A-Z-[F-H]]','i') both return true, which is a little surprising, while matches('G','[A-Z-[F-Hf-h]]','i') returns false. And matches('G','[^G]','i') = false while matches('G','[^F-H]',i') = true I need to do a bit more investigation to see whether it's Java that's behaving this way, or whether its a consequence of the way I translate XPath regex to Java regex syntax (I use James Clark's code for this, modified to handle the XPath extensions to Schema regex syntax). If anyone can do some experiments with Perl, that would be useful... Michael Kay
Received on Wednesday, 17 August 2005 13:11:40 UTC