- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 31 Aug 2005 20:58:21 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1850
------- Additional Comments From liam@w3.org 2005-08-31 20:58 -------
The examples from Mike Kay's comment,
matches('G','[A-Z-[f-h]]','i')
and matches('G','[A-Z-[F-H]]','i')
are not well-formed in Perl: the operands of "-" must
be a character, not a range. Perl does not support
range subtraction directly (see below)...
So, [A-Z-[f-h]] ends up matching the literal [f-h]
and nothing else as far as I can tell.
the example
matches('G','[A-Z-[F-Hf-h]]','i')
is the same, matching the literal string [F-Hf-h]
(I don't think it's specified that it works this
way, so it's a bug that Perl doesn't trap this case
I think)
The example
matches('G','[^F-H]','i')
does not match in Perl, neither with nor without the /i
Note that the pattern [A-Z] might or might not match both
a and z: a common collation order on Linux at least for case
insensitive matching is aAbBcCdD...zZ, so A-Z excludes the "a".
This doesn't affect Perl by default, as it uses unicode codepoints
unless you put
use locale;
in your Perl script (see man pages for perlre and perllocale,
or run "perldoc perlre" to see them...)
"G" does not match /[^G]/i in Perl
Perl's nearest equivalent for range subtraction is the
zero-width negative lookahead assertion, (?!e), which matches
only if it is not immediately followed by something that
matches the contained expression e. Hence,
/(?![f-h])[A-Z]/i
matches b and w but not g or G.
I think the real question here is whether a range can introduce
or exclude unexpected characters when case insensitive. I experimented,
but the version of Perl I'm using doesn't like ranges in character classes
if they are above codepoint 127 decimal for some reason, although it's
otherwise 8-bit clean, and can match explicit characters in classes.
Received on Wednesday, 31 August 2005 20:58:25 UTC