- From: <bugzilla@wiggum.w3.org>
- Date: Thu, 21 Dec 2006 23:35:46 +0000
- To: public-qt-comments@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=4106 Summary: [F+O] regex syntax: position of backreference Product: XPath / XQuery / XSLT Version: Proposed Recommendation Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Functions and Operators AssignedTo: ashok.malhotra@oracle.com ReportedBy: mike@saxonica.com QAContact: public-qt-comments@w3.org In the syntax of regular expressions, a backreference is currently allowed as a charClassEsc. As such it can appear either outside square brackets, for example (abc)\1 or within square brackets (abc)[\1] However, it doesn't make sense to allow a backreference within square brackets, because constructs allowed within square brackets always match a single character (perhaps one of a set of possible characters, but never a sequence of more than one character), while a back-reference will in general match a sequence of characters. I think that backreferences should appear in the syntax at the level of an atom: [9] atom ::= Char | charClass | ( '(' regExp ')' ) | backReference I have not been able to find a similar restriction documented for REs in Perl, Java, or .NET. However, none of these languages attempt to define the syntax of regular expressions using a BNF grammar, or to give a rigorous exposition of the semantics. Experiments with Java suggest that "(abc)[\1]" is accepted as a valid regular expression, but its semantics appear to be undefined: I am unable to identify any string that it matches.
Received on Thursday, 21 December 2006 23:36:00 UTC