[Bug 3949] New regex function to match and return a list of captured strings

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3949





------- Comment #2 from rslite@rdslink.ro  2006-11-05 16:50 -------
(In reply to comment #1)
> In what way doesn't fn:tokenize() work for you?
> 
> http://www.w3.org/TR/xpath-functions/#func-tokenize
> 

The function tokenize() will return the strings that are not matched by the
regex. I need the groups that _are_ matched. Maybe I should have given an
example in the first place: I am processing some data in xslt and I need to
extract birth dates and places for some people. The problem is that the data
has different formats. Some examples are:
(1900)
(1900 place 1)
(1900 - 2000)
(1900 place 1 - 2000)
(1900 place 1 - 2000 place 2)
(1900 - 2000 place 2)
So the informations are the year of birth and death and the places of birth and
death. I have to use right now four fn:replace() to obtain the needed info,
which means more code verbosity (either repeat the expression for each replace
or make a variable and used it for all the four replace's). And also if the
expression is not matched nothing is replaced and you get the original string.
Sometimes I needed an empty string instead so I had to use an additional
matches() to do the replace only if the regex will surely match the string. 

It would be much easier if a single regex could be used in a function called
for example match-groups(., '^\((\d{4})(.*?)-?(\d{4})?(.*?)\)$') and this
function will return all the matched groups, empty or not. For the last example
it should return ('1900', ' ', '2000', ' place 2'). Thus you have all the
information ready, and in a simple way.

I hope I was more clear this time.
Best regards,
Radu

Received on Sunday, 5 November 2006 16:50:25 UTC