Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft) from David Carlisle on 2002-01-07 (www-xml-query-comments@w3.org from January 2002)

From: David Carlisle <davidc@nag.co.uk>
Date: Mon, 7 Jan 2002 09:58:12 GMT
To: xsl-list@lists.mulberrytech.com
CC: www-xml-query-comments@w3.org, Jeni Tennison <jeni@jenitennison.com>
Message-Id: <200201070958.JAA31758@penguin.nag.co.uk>

> Most regular expression languages don't find overlapping matches, do
> they? It seems to add a lot of extra complexity if they do.

No, but then they don't return a list of all matches either.
In emacs for example I regexp-search to the first occurrence then can
choose to restart the search from the end of the found text or the
beginning (or anywhere else). In xpath as currently spec'd I'm forced to
find all the non overlapping occurrences in the entire text even if I
only want to find the first, make a replacement and then start again
searching in the new partly edited text.


> In the description of xf:replace() it says:
> 
>   The value of $repval may use the standard regular expression syntax
>   of "$N"

oops I just missed that (there's a lot of documents to skim over:-)

however I don't think that this is particularly useful. (see below)

> I don't think that the xf:match() function needs to return the
> positions of the subexpressions, or the subexpressions themselves,
> because that functionality could be achieved via xf:replace(). For
> example, to find out what string was matched by the first
> subexpression you could just use "$1" as the replace value.

Looking at why one needs regexp in an XML query language, it is usually 
to infer structure into otherwise unstructured (by XML) input.
ie to "UP TRANSLATE" in omnimark parlance.

Here's a snippet of an omnimark script I had lying around:

TRANSLATE "'" (letter+ ) => found-text "'"
     OUTPUT "<e>%x(found-text)</e>"


this changes 'abc' to <e>abc</e> 

You could of course do something similar with perl.


However you could not do this with xf:replace.

perl and omnimark you can add XML markup as a string, as their
underlying data structures are not as tree oriented as Xpath.

In Xpath you can't do that. So a replace function that only lets you
replace one set of unstructured input by some more unstructured output
is not particularly useful.

If however the match function returned the sequence of substrings
matched or equivalently a sequence of the match positions, then the
string could be broken up and nodes added as required.

Actually it might be interesting (and more in the xpath style) to allow
omnimark style named variable binding (the found-text in the above)
within the serach string which would then be accessed by normal xpath
xpath variable reference, $found-text, in any functions triggered by the
replacement code.

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

Received on Monday, 7 January 2002 04:59:10 UTC