XQuery/Reg Exes

Hi Liam,

(note the public POWDER list cc'd in here for the archive)

We've not met but I seem to recall being in the same restaurant as you 
and a bunch of other W3C types last time we were in Cannes Mandelieu.

I'm writing to dig a little deeper into regular expression syntax in 
XQuery and XPath.

Background - we, that is the POWDER WG - are working on a Rec Track doc 
to define groups of resources and RegExes come into the story [1]. When 
we looked at REs we had a lengthy discussion about greedy and non-greedy 
matches (Perl supports this, XML Schema doesn't), and the number of 
implementations/number of people who know how to use Perl REs vs. XML 
Schema REs, default matching behaviour without ^ and $ characters etc. I 
see from your group's work at [2] that you've wrestled with the same 
issues and have added in those features to XML Schema which is terrific.

With that done, I wonder what you would recommend we do in the next 
draft of our doc in which we currently come down on the side of using 
Perl REs. Actually, I can pretty much guess what you'll recommend - use 
XML schema REs as modified by [2] :-) BUT... a couple of questions if I may:

1. In XML Schema, by default, an RE matches from the beginning to the 
end of the string. i.e. the ^ and $ meta characters, familiar from Perl, 
  are implicit. [2] adds these meta characters back in, but, almost 
certainly as a result of a lack of brain power on my part, it's not 
clear to me whether these are now required or optional. In other words, 
if in POWDER we said that we're going to use XML REs, would the RE

example.com/

match the string

http://www.example.com/?

If ^ and $ are implied and therefore not required, it won't. If they are 
required, i.e. position is not important, then it will.

2. If, as I suspect, ^ and $ are required to anchor REs at the beginning 
and end, is there now any significant difference between a Perl Regular 
Expression and an XML Schema RE, as amended by [2]? Indeed, is there any 
difference at all?

3. If there are differences, is there a document that you can point to 
elucidates them please?

The group's preference is to use the W3C Rec but we need to be sure that 
POWDER implementations aren't going to come a cropper because of an 
unusual feature of XML REs that are likely to return a different result 
from that which a typical programmer used to working with Perl REs would 
expect. Our use cases are pretty simple in this regard (we're just 
writing REs to match against URIs) so the REs are likely to be pretty 
simple too.

Thanks

Phil.

[1] http://www.w3.org/TR/2007/WD-powder-grouping-20070709/#reMatch
[2] http://www.w3.org/TR/xpath-functions/#regex-syntax

-- 
Phil Archer
Chief Technical Officer,
Family Online Safety Institute
w. http://www.fosi.org/people/philarcher/

Already labelled with ICRA? It's time to raise the bar on child 
protection standards by ensuring your site is ICRAchecked.
See http://checked.icra.org/ for more info.

Received on Thursday, 9 August 2007 11:34:14 UTC