- From: Phil Archer <parcher@icra.org>
- Date: Thu, 09 Aug 2007 12:34:02 +0100
- To: liam@w3.org
- CC: Public POWDER <public-powderwg@w3.org>
Hi Liam, (note the public POWDER list cc'd in here for the archive) We've not met but I seem to recall being in the same restaurant as you and a bunch of other W3C types last time we were in Cannes Mandelieu. I'm writing to dig a little deeper into regular expression syntax in XQuery and XPath. Background - we, that is the POWDER WG - are working on a Rec Track doc to define groups of resources and RegExes come into the story [1]. When we looked at REs we had a lengthy discussion about greedy and non-greedy matches (Perl supports this, XML Schema doesn't), and the number of implementations/number of people who know how to use Perl REs vs. XML Schema REs, default matching behaviour without ^ and $ characters etc. I see from your group's work at [2] that you've wrestled with the same issues and have added in those features to XML Schema which is terrific. With that done, I wonder what you would recommend we do in the next draft of our doc in which we currently come down on the side of using Perl REs. Actually, I can pretty much guess what you'll recommend - use XML schema REs as modified by [2] :-) BUT... a couple of questions if I may: 1. In XML Schema, by default, an RE matches from the beginning to the end of the string. i.e. the ^ and $ meta characters, familiar from Perl, are implicit. [2] adds these meta characters back in, but, almost certainly as a result of a lack of brain power on my part, it's not clear to me whether these are now required or optional. In other words, if in POWDER we said that we're going to use XML REs, would the RE example.com/ match the string http://www.example.com/? If ^ and $ are implied and therefore not required, it won't. If they are required, i.e. position is not important, then it will. 2. If, as I suspect, ^ and $ are required to anchor REs at the beginning and end, is there now any significant difference between a Perl Regular Expression and an XML Schema RE, as amended by [2]? Indeed, is there any difference at all? 3. If there are differences, is there a document that you can point to elucidates them please? The group's preference is to use the W3C Rec but we need to be sure that POWDER implementations aren't going to come a cropper because of an unusual feature of XML REs that are likely to return a different result from that which a typical programmer used to working with Perl REs would expect. Our use cases are pretty simple in this regard (we're just writing REs to match against URIs) so the REs are likely to be pretty simple too. Thanks Phil. [1] http://www.w3.org/TR/2007/WD-powder-grouping-20070709/#reMatch [2] http://www.w3.org/TR/xpath-functions/#regex-syntax -- Phil Archer Chief Technical Officer, Family Online Safety Institute w. http://www.fosi.org/people/philarcher/ Already labelled with ICRA? It's time to raise the bar on child protection standards by ensuring your site is ICRAchecked. See http://checked.icra.org/ for more info.
Received on Thursday, 9 August 2007 11:34:14 UTC