[Bug 1851] New: [F&O] back references to a group that was captured 0 times?

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1851

           Summary: [F&O] back references to a group that was captured 0
                    times?
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Platform: PC
        OS/Version: Windows 2000
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Functions and Operators
        AssignedTo: ashok.malhotra@oracle.com
        ReportedBy: fred.zemke@oracle.com
         QAContact: public-qt-comments@w3.org


Section 7.6.1.1 first bullet after Rule [4] describes the semantics of
sub-expressions (frequently called capture groups in other literature on
regular expressions) in fn:replace().  The last sentence describes what 
is captured by a parenthesized sub-group with a quantifier such as "*",
namely the last substring that matched.

Presumably this sentence also applies to the next bullet, about back 
references, in which case this sentence is applicable to more than just
fn:replace.  This should be clarified.

Assuming that this sentence does apply to back references, then I salute
you, because you have specified something that Posix and Perl neglected
to specify.

But there is still something to be specified: what if the back reference
matches a repeating group that is matched 0 times?  Example:
pattern "(c)*b\1", string to be searched "abc".  The (c)* must match
the zero-length string immediately before "b" in the searched string.  
Then the "b" in the pattern matches the "b" in the searched string.
Now, what does the back reference \1 match?  The sentence that I cited 
does not answer this question, since there is no "last" match when 
there are no matches at all.

My informants tell me that the intended behavior is that \1 has no match;
therefore fn:matches would return false for this example.

On the other hand, consider the pattern "((c)*)b\1" with the same string
to be searched.  In this example, (c)* matches zero times, ((c)*) 
matches the zero-length string one time, and consequently \1 is tasked
with matching the zero-length string, which it can do immediately 
following "b" in the searched string.  Consequently fn:matches would 
return true.

Received on Tuesday, 16 August 2005 19:51:17 UTC