- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 31 Aug 2005 20:58:21 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850 ------- Additional Comments From liam@w3.org 2005-08-31 20:58 ------- The examples from Mike Kay's comment, matches('G','[A-Z-[f-h]]','i') and matches('G','[A-Z-[F-H]]','i') are not well-formed in Perl: the operands of "-" must be a character, not a range. Perl does not support range subtraction directly (see below)... So, [A-Z-[f-h]] ends up matching the literal [f-h] and nothing else as far as I can tell. the example matches('G','[A-Z-[F-Hf-h]]','i') is the same, matching the literal string [F-Hf-h] (I don't think it's specified that it works this way, so it's a bug that Perl doesn't trap this case I think) The example matches('G','[^F-H]','i') does not match in Perl, neither with nor without the /i Note that the pattern [A-Z] might or might not match both a and z: a common collation order on Linux at least for case insensitive matching is aAbBcCdD...zZ, so A-Z excludes the "a". This doesn't affect Perl by default, as it uses unicode codepoints unless you put use locale; in your Perl script (see man pages for perlre and perllocale, or run "perldoc perlre" to see them...) "G" does not match /[^G]/i in Perl Perl's nearest equivalent for range subtraction is the zero-width negative lookahead assertion, (?!e), which matches only if it is not immediately followed by something that matches the contained expression e. Hence, /(?![f-h])[A-Z]/i matches b and w but not g or G. I think the real question here is whether a range can introduce or exclude unexpected characters when case insensitive. I experimented, but the version of Perl I'm using doesn't like ranges in character classes if they are above codepoint 127 decimal for some reason, although it's otherwise 8-bit clean, and can match explicit characters in classes.
Received on Wednesday, 31 August 2005 20:58:25 UTC