Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

[resent to xsl-editors as typo in address, sorry)

Date: 7 Jan 2002 13:40:27 +0000
From: David Carlisle <davidc@nag.co.uk>
To:  xsl-list@lists.mulberrytech.com, www-xml-query-comments@w3.org,
	jeni@jenitennison.com, xslt-editors@w3.org
In-reply-to: message from David Carlisle on 7 Jan 2002 09:58:10 +0000
Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)


I said (on www-xml-query-comments)

> Here's a snippet of an omnimark script I had lying around:
> 
> TRANSLATE "'" (letter+ ) => found-text "'"
>      OUTPUT "<e>%x(found-text)</e>"
> 
> 
> this changes 'abc' to <e>abc</e> 


For those not familiar with omnimark, maybe I should say something else
about the functionality of regexps in that system.

the typical use is to set up a number of these TRANSLATE clauses, 
which work more or less exactly like templates in xslt, except they
match on regexps in a string rather than on node paths as in xpath
patterns. But the point is that once the match is found you are not
limited to a "replace match" but can, as in a template, execute
arbitrary code, the code has available the "current context" ie where in
the string the match was found, together with any local variables which
are bound by the regexp expression to return any subexpressions in the
match. 

This allows one to input a plain text file and treat it in more or less
the same way as an XML file, but with the "templates" triggered by
regexps rather than elements. The system simultaneously searches for
any regexp in the currently active TRANSLATE clauses, and fires the
appropriate templates.

In fact in omnimark one can use these TRANSLATE rules intermixed with
ELEMENT rules (which are direct analogues of xslt templates)
the "apply templates" behaviour just seamlessly drops from one to the
other, in xslt terminology apply-templates on a text node applies these
TRANSLATE rules, although the default behaviour is just to copy text
rather than to apply templates, so unless rexexp matching is explictly
turned on text just gets copied by default.

I'm not particularly pushing omnimark semantics here but it is an
existing XML query and construction language with well integrated regexp
support. By now I only use omnimark in prefernce to XSLT when the input
documents are insufficiently marked up that I need the regexp support.
That's why I think there has been an often repeated request for regexp
support in xpath/xslt 2, but unless I misunderstand the current spec
the suggested functionality is nowhere near powerful enough for this
kind of use.

David

I added the xslt-editrs to the cc as I seem to have mainly phrased this
comment in xslt template terms, although the same issues must apply to
xquery as well.



_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

Received on Monday, 7 January 2002 09:34:51 UTC