- From: <bugzilla@wiggum.w3.org>
- Date: Thu, 21 Dec 2006 23:35:46 +0000
- To: public-qt-comments@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=4106
Summary: [F+O] regex syntax: position of backreference
Product: XPath / XQuery / XSLT
Version: Proposed Recommendation
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: normal
Priority: P2
Component: Functions and Operators
AssignedTo: ashok.malhotra@oracle.com
ReportedBy: mike@saxonica.com
QAContact: public-qt-comments@w3.org
In the syntax of regular expressions, a backreference is currently allowed as a
charClassEsc. As such it can appear either outside square brackets, for example
(abc)\1
or within square brackets
(abc)[\1]
However, it doesn't make sense to allow a backreference within square brackets,
because constructs allowed within square brackets always match a single
character (perhaps one of a set of possible characters, but never a sequence of
more than one character), while a back-reference will in general match a
sequence of characters.
I think that backreferences should appear in the syntax at the level of an
atom:
[9] atom ::= Char | charClass | ( '(' regExp ')' ) | backReference
I have not been able to find a similar restriction documented for REs in Perl,
Java, or .NET. However, none of these languages attempt to define the syntax of
regular expressions using a BNF grammar, or to give a rigorous exposition of
the semantics. Experiments with Java suggest that "(abc)[\1]" is accepted as a
valid regular expression, but its semantics appear to be undefined: I am unable
to identify any string that it matches.
Received on Thursday, 21 December 2006 23:36:00 UTC