- From: <bugzilla@wiggum.w3.org>
- Date: Tue, 11 Mar 2008 15:44:59 +0000
- To: public-qt-comments@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5348 ------- Comment #1 from mike@saxonica.com 2008-03-11 15:44 ------- In action A-358-06 I was asked to review what Perl does about this. There is of course no formal specification of Perl. The man page http://www.perl.com/doc/manual/html/pod/perlre.html states: "Within the pattern, \10, \11, etc. refer back to substrings if there have been at least that many left parentheses before the backreference." This clearly doesn't make sense. If \10 occurs after the 10th left paren, but before the right paren that matches the 10th left paren, then it cannot "refer back" to that substring. The Java 5 statement is informal but more defensible: "In this class, \1 through \9 are always interpreted as back references, and a larger number is accepted as a back reference if at least that many subexpressions exist at that point in the regular expression, otherwise the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit." But it still has the problem our text has, that you can be in the middle of subexpression 10 even though there have been 15 completed subexpressions. Pragmatically, with Java 5: Pattern.matches("(X)(\11)", "XX1") true - the backreference is to subexp 1 Pattern.matches("(X)(2)?(3)?(4)?(5)?(6)?(7)?(8)?(9)?(10)?(Y)(\\11)", "XYY") true - the backreference is to subexp 11 Pattern.matches("(X)(2)?(3)?(4)?(5)?(6)?(7)?(8)?(9)?(10)?((Y)(\\11))", "XYX1") false. Here the back-reference \11 appears within the 11th subexpression. I can't find any string that matches this regex. It seems to be treating it as a reference to subexpression 11, which can never be matched, rather than treating it as a reference to subexpression 1. Pattern.matches("(X)(2)?(3)?(4)?(5)?(6)?(7)?(8)?(9)?(10)?((Y)(\\12))", "XYY") true. The back-reference \12 is recognized as referring to subexp 12, although it actually appears within subexp 11. So the (provisional) conclusion for Java is that it does what Perl says: it recognizes the backreference \11 if there are 11 open parens; if there haven't been 11 close parens then the back-reference will never match anything. I don't immediately have the ability to test what Perl does.
Received on Tuesday, 11 March 2008 15:45:17 UTC