[Bug 4106] [F+O] regex syntax: position of backreference

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4106





------- Comment #3 from mike@saxonica.com  2007-02-13 16:42 -------
The WG has approved this change in principle, subject to detailed wording.

Here is proposed text for the change to F+O.

Within section 7.6.1, the bullet point starting "Back-References are allowed"
should change to read as follows:

# Back-references are allowed outside a character class expression. A
back-reference is an additional kind of atom. The construct \n where n is a
single digit is always recognized as a back-reference; if this is followed by
further digits, these digits are taken to be part of the back-reference if and
only if the back-reference is preceded by sufficiently many capturing
subexpressions. A back-reference matches the string that was matched by the nth
capturing subexpression within the regular expression, that is, the
parenthesized subexpression whose opening left parenthesis is the nth unescaped
left parenthesis within the regular expression. The closing right parenthesis
of this subexpression must occur before the back-reference. For example, the
regular expression ('|").*\1 matches a sequence of characters delimited either
by an apostrophe at the start and end, or by a quotation mark at the start and
end.

If no string is matched by the nth capturing subexpression, the back-reference
is interpreted as matching a zero-length string.

Back-references change the following production:

[9] atom ::= Char | charClass | ( '(' regExp ')' )

to

[9] atom ::= Char | charClass | ( '(' regExp ')' ) | backReference

[9a] backReference ::= "\" [1-9][0-9]*

Note: within a character class expression, "\" followed by a digit is an error.
Some other regular expression languages interpret this as an octal character
reference.

Received on Tuesday, 13 February 2007 16:43:11 UTC