[Bug 5348] [F&O] Back-references: "sufficiently many"

http://www.w3.org/Bugs/Public/show_bug.cgi?id=5348

           Summary: [F&O] Back-references: "sufficiently many"
           Product: XPath / XQuery / XSLT
           Version: Recommendation
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Functions and Operators
        AssignedTo: mike@saxonica.com
        ReportedBy: mike@saxonica.com
         QAContact: public-qt-comments@w3.org


In the specification for back-references in regular expressions (repeated
unchanged in Erratum E4), we use the phrase:

<quote>
The construct \n where n is a single digit is always recognized as a
back-reference; if this is followed by further digits, these digits are taken
to be part of the back-reference if and only if the back-reference is preceded
by sufficiently many capturing subexpressions.
</quote>

So what happens if the regular expression uses \11, and it is preceded by 12
capturing subexpressions, but there is no subexpression 11 because the closing
paren for group 11 has not yet been encountered? That is:

(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11(12)(13)\11)

Is \11 intepreted as a reference to the non-existent group 11, or as a
reference to group 1 followed by the digit 1?

I think it should be the latter. This involves changing the text to:

...these digits are taken to be part of the back-reference if and only if the
back-reference is preceded by a capturing subexpression with the relevant
number  (so \12 is treated as a reference to captured subexpression 12 if the
back-reference is preceded by the closing parenthesis that matches the 12th
opening parenthesis).

The error condition described in erratum E4 as:

<quote>
 The regular expression is invalid if this subexpression does not exist or if
its closing right parenthesis occurs after the back-reference.
</quote>

can then occur only for a single-digit back-reference.

Editorially, it might be appropriate to reorder the sentences in the resulting
paragraph.

Received on Saturday, 5 January 2008 11:00:24 UTC