Re: XQuery/Reg Exes from Phil Archer on 2007-09-06 (public-powderwg@w3.org from September 2007)

From: Phil Archer <parcher@icra.org>
Date: Thu, 06 Sep 2007 10:43:13 +0100
To: Liam Quin <liam@w3.org>
CC: Public POWDER <public-powderwg@w3.org>
Message-ID: <46DFCBB1.8020604@icra.org>
Liam,

Now well and truly back after the summer break, I'm trying to write this 
up and would be very grateful if you could answer two quick questions 
please:

1. Is it fair to say: "The XQuery 1.0/XPath 2.0 Regular Expression 
avoids some of the more advanced RE features that can require excessive 
processing which is not appropriate or needed when establishing whether 
a candidate resource is or it not an element of a Resource Set."

(i.e. all we're ever doing is matching an RE against a URI to find out 
if there is or is not a match so we'd never need the Perl list function, 
for example.)

2. What is the identifier for an XPath 2.0 RE? We're defining an RDF 
property that has an RE as its range but if we just use XML Schema that 
won't include the modifications in XPath. So perhaps it should be 
http://www.w3.org/2005/xpath-functions#regex-syntax ?

Thanks for your help.

Phil.

Liam Quin wrote:
> On Thu, Aug 09, 2007 at 12:34:02PM +0100, Phil Archer wrote:
>> [...]
>> With that done, I wonder what you would recommend we do in the next 
>> draft of our doc in which we currently come down on the side of using 
>> Perl REs. Actually, I can pretty much guess what you'll recommend - use 
>> XML schema REs as modified by [2] :-) BUT... a couple of questions if I may:
>>
>> 1. In XML Schema, by default, an RE matches from the beginning to the 
>> end of the string. i.e. the ^ and $ meta characters, familiar from Perl, 
>>  are implicit. [2] adds these meta characters back in, but, almost 
>> certainly as a result of a lack of brain power on my part, it's not 
>> clear to me whether these are now required or optional. In other words, 
>> if in POWDER we said that we're going to use XML REs, would the RE
>>
>> example.com/
>>
>> match the string
>>
>> http://www.example.com/?
> 
> In XML Schema the ^ and $ are always there, implicitly; but in XPath 2,
> and hence XSLT 2 and XQuery, they are not there, so yes, it would match.
> 
>> 2. If, as I suspect, ^ and $ are required to anchor REs at the beginning 
>> and end, is there now any significant difference between a Perl Regular 
>> Expression and an XML Schema RE, as amended by [2]? Indeed, is there any 
>> difference at all?
> Yes, tons of differencees.
> 
> Perl allows arbitrary code to be executed as part of pattern matching.
> It has negative zero-width look-ahead and look-behind assertions.
> 
> Perl has non-capturing parenthises, (?: xxx ), which are the same as
> (...) but don't affect $1, $2 (\1, \2)
> 
> See the documenation on perl regular expressions.
> 
>> The group's preference is to use the W3C Rec but we need to be sure that 
>> POWDER implementations aren't going to come a cropper because of an 
>> unusual feature of XML REs that are likely to return a different result 
>> from that which a typical programmer used to working with Perl REs would 
>> expect.
> 
> There are some things people particular want that we don't have -- most
> notably a function to return a list of matched items, as in Perl's
>     my @results = ($input =~ m/(\d+)\s+(\d+)/);
> We also don't have thee Perl "x" flag, unfortunately, which allows
> "extended" expression syntax in which whitespace and comments can
> be used for layout.
> 
> But I don't think there are places where our regular expressions
> have a radically different (from Perl) and surprising interpretation
> for the same syntax.
> 
> Liam
Received on Thursday, 6 September 2007 09:43:34 UTC