[F/O] backreferences in regexen

Hi

I'd like to re-raise a request I expressed in May [post1] regarding the 
addition of backreferences in regexen. It has been discussed previously 
so I expect no replies, but merely wanted it to be known that I still 
think that XSLT 2 / XQuery 1.0 / XPath 2.0 F/O should have this feature.

My research in May [langs] did show that the feature is widely available 
in programming languages and in existing regular expression libraries.
Among the popular languages which support backreferences in regexen  are 
Ruby, Perl, Python and Java, an ancient version of sed already supported 
this feature [sed], and the PCRE lib provides this feature for C++ etc 
so it can be used to implement regexen in many languages and apps.

F/O adds "^" and "$" to the regex set of WXS [additions] which are very 
useful, it should also add "\[number]".

Some developers (if not most) agree with me about the usefulness of the 
feature:

[mk]
"I agree this can be a handy feature [...]"

... and some also agree that it is very popular in addition to being 
very useful:

[todd]
"I agree with all of his arguments - back references are easy to 
implement, are available in every regex library I have used over the 
past several years, and extremely useful. Some tasks become 
substantially more difficult without back references.
[...]
I believe many real-world users will be unpleasantly surprised if back 
references are not supported."

Tobi

[post1]
http://lists.w3.org/Archives/Public/public-qt-comments/2003May/0288.html
(more sample output is at
http://www.pinkjuice.com/howto/vimxml/setup.xml#catalogs)

[langs]
http://lists.w3.org/Archives/Public/public-qt-comments/2003May/0298.html

[sed]
http://lists.w3.org/Archives/Public/public-qt-comments/2003May/0307.html

[additions]
http://www.w3.org/TR/xquery-operators/#regex-syntax
"The regular expression syntax and semantics for these functions are 
identical to those defined in [XML Schema Part 2: Datatypes] with the 
following additions:
[...]
Two meta-characters, ^ and $ are added. In string mode, the 
metacharacter ^ matches the start of the entire string, while $ matches 
the end of the entire string.
[...]"

[mk]
http://lists.w3.org/Archives/Public/public-qt-comments/2003May/0294.html

[todd]
http://lists.w3.org/Archives/Public/public-qt-comments/2003Jun/0125.html


-- 
http://www.pinkjuice.com/

Received on Tuesday, 19 August 2003 04:11:54 UTC