Re: XQuery/Reg Exes from Liam Quin on 2007-08-09 (public-powderwg@w3.org from August 2007)

From: Liam Quin <liam@w3.org>
Date: Thu, 9 Aug 2007 15:08:00 -0400
To: Phil Archer <parcher@icra.org>
Cc: Public POWDER <public-powderwg@w3.org>
Message-ID: <20070809190800.GA13065@w3.org>

On Thu, Aug 09, 2007 at 12:34:02PM +0100, Phil Archer wrote:
> [...]
> With that done, I wonder what you would recommend we do in the next 
> draft of our doc in which we currently come down on the side of using 
> Perl REs. Actually, I can pretty much guess what you'll recommend - use 
> XML schema REs as modified by [2] :-) BUT... a couple of questions if I may:
> 
> 1. In XML Schema, by default, an RE matches from the beginning to the 
> end of the string. i.e. the ^ and $ meta characters, familiar from Perl, 
>  are implicit. [2] adds these meta characters back in, but, almost 
> certainly as a result of a lack of brain power on my part, it's not 
> clear to me whether these are now required or optional. In other words, 
> if in POWDER we said that we're going to use XML REs, would the RE
> 
> example.com/
> 
> match the string
> 
> http://www.example.com/?

In XML Schema the ^ and $ are always there, implicitly; but in XPath 2,
and hence XSLT 2 and XQuery, they are not there, so yes, it would match.

> 2. If, as I suspect, ^ and $ are required to anchor REs at the beginning 
> and end, is there now any significant difference between a Perl Regular 
> Expression and an XML Schema RE, as amended by [2]? Indeed, is there any 
> difference at all?
Yes, tons of differencees.

Perl allows arbitrary code to be executed as part of pattern matching.
It has negative zero-width look-ahead and look-behind assertions.

Perl has non-capturing parenthises, (?: xxx ), which are the same as
(...) but don't affect $1, $2 (\1, \2)

See the documenation on perl regular expressions.

> The group's preference is to use the W3C Rec but we need to be sure that 
> POWDER implementations aren't going to come a cropper because of an 
> unusual feature of XML REs that are likely to return a different result 
> from that which a typical programmer used to working with Perl REs would 
> expect.

There are some things people particular want that we don't have -- most
notably a function to return a list of matched items, as in Perl's
    my @results = ($input =~ m/(\d+)\s+(\d+)/);
We also don't have thee Perl "x" flag, unfortunately, which allows
"extended" expression syntax in which whitespace and comments can
be used for layout.

But I don't think there are places where our regular expressions
have a radically different (from Perl) and surprising interpretation
for the same syntax.

Liam

-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/

Received on Thursday, 9 August 2007 19:08:04 UTC