[Bug 1851] [F&O] back references to a group that was captured 0 times? from bugzilla@wiggum.w3.org on 2005-08-17 (public-qt-comments@w3.org from August 2005)

From: <bugzilla@wiggum.w3.org>
Date: Wed, 17 Aug 2005 13:36:50 +0000
To: public-qt-comments@w3.org
Cc:
Message-Id: <E1E5O66-0005xw-MD@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1851





------- Additional Comments From mike@saxonica.com  2005-08-17 13:36 -------
First a meta-comment: as you observe, many specifications of regular expression
semantics are amazingly informal, and we already do very well compared with
other languages such as Perl and Java. It's good to get the specification
precise, but there may be a point at which it's better to leave things a little
fuzzy at the edges to allow implementors to do whatever the underlying library
does. If users are managing to write Perl and Java without precise guarantees of
behaviour in edge cases, perhaps this isn't a problem we need to solve. There's
also a danger that if we overspecify, some of the implementors who are less
concerned about 100% conformance may simply ignore us.

You're right that captured subgroups now apply not only to replace(), but also
to back-references (and also to xsl:analyze-string in XSLT).

My expectation is that the captured substring for a subexpression that's matched
zero times is the zero-length string. In XSLT we say this quite explicitly (see
http://www.w3.org/TR/xslt20/#regex-group). This also applies to cases such as

((a)|(b)) 

Michael Kay

Received on Wednesday, 17 August 2005 13:37:05 UTC