- From: Liam Quin <liam@w3.org>
- Date: Thu, 9 Aug 2007 15:08:00 -0400
- To: Phil Archer <parcher@icra.org>
- Cc: Public POWDER <public-powderwg@w3.org>
On Thu, Aug 09, 2007 at 12:34:02PM +0100, Phil Archer wrote: > [...] > With that done, I wonder what you would recommend we do in the next > draft of our doc in which we currently come down on the side of using > Perl REs. Actually, I can pretty much guess what you'll recommend - use > XML schema REs as modified by [2] :-) BUT... a couple of questions if I may: > > 1. In XML Schema, by default, an RE matches from the beginning to the > end of the string. i.e. the ^ and $ meta characters, familiar from Perl, > are implicit. [2] adds these meta characters back in, but, almost > certainly as a result of a lack of brain power on my part, it's not > clear to me whether these are now required or optional. In other words, > if in POWDER we said that we're going to use XML REs, would the RE > > example.com/ > > match the string > > http://www.example.com/? In XML Schema the ^ and $ are always there, implicitly; but in XPath 2, and hence XSLT 2 and XQuery, they are not there, so yes, it would match. > 2. If, as I suspect, ^ and $ are required to anchor REs at the beginning > and end, is there now any significant difference between a Perl Regular > Expression and an XML Schema RE, as amended by [2]? Indeed, is there any > difference at all? Yes, tons of differencees. Perl allows arbitrary code to be executed as part of pattern matching. It has negative zero-width look-ahead and look-behind assertions. Perl has non-capturing parenthises, (?: xxx ), which are the same as (...) but don't affect $1, $2 (\1, \2) See the documenation on perl regular expressions. > The group's preference is to use the W3C Rec but we need to be sure that > POWDER implementations aren't going to come a cropper because of an > unusual feature of XML REs that are likely to return a different result > from that which a typical programmer used to working with Perl REs would > expect. There are some things people particular want that we don't have -- most notably a function to return a list of matched items, as in Perl's my @results = ($input =~ m/(\d+)\s+(\d+)/); We also don't have thee Perl "x" flag, unfortunately, which allows "extended" expression syntax in which whitespace and comments can be used for layout. But I don't think there are places where our regular expressions have a radically different (from Perl) and surprising interpretation for the same syntax. Liam -- Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/ http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/
Received on Thursday, 9 August 2007 19:08:04 UTC