[Bug 4106] [F+O] regex syntax: position of backreference


        AssignedTo: ashok.malhotra@oracle.com
        ReportedBy: mike@saxonica.com
In the syntax of regular expressions, a backreference is currently allowed as a
charClassEsc. As such it can appear either outside square brackets, for example 


or within square brackets


However, it doesn't make sense to allow a backreference within square brackets,
because constructs allowed within square brackets always match a single
character (perhaps one of a set of possible characters, but never a sequence of
more than one character), while a back-reference will in general match a
sequence of characters.

I think that backreferences should appear in the syntax at the level of an

[9] atom ::= Char | charClass | ( '(' regExp ')' ) | backReference

I have not been able to find a similar restriction documented for REs in Perl,
Java, or .NET. However, none of these languages attempt to define the syntax of
regular expressions using a BNF grammar, or to give a rigorous exposition of
the semantics. Experiments with Java suggest that "(abc)[\1]" is accepted as a
valid regular expression, but its semantics appear to be undefined: I am unable
to identify any string that it matches.

Received on Thursday, 21 December 2006 23:36:00 UTC