W3C home > Mailing lists > Public > public-qt-comments@w3.org > September 2012

Re: Behavior of $ in regular expressions

From: Michael Kay <mike@saxonica.com>
Date: Tue, 25 Sep 2012 09:46:57 +0100
Message-ID: <50616F81.70707@saxonica.com>
To: public-qt-comments@w3.org

This issue came up earlier this year (which might account for the 
presence of the tests): see

https://www.w3.org/Bugs/Public/show_bug.cgi?id=16809

We decided to leave the spec as written (for XPath 2.0) on the grounds that

(a) it states quite unambiguously what is required

(b) about half of the implementations we surveyed do what the spec says

In deciding "edge cases" concerning the semantics of regular expressions 
we do sometimes take into account the behaviour of well-known regular 
expression libraries such as PCRE. Surprisingly often, it's impossible 
to find a clear specification and we have to resort to empirical 
investigation; and in such cases it's not uncommon to discover that 
different regex libraries differ from each other.

Of course, this is an area where some vendors make a deliberate choice 
not to conform to the spec as written, but that's their choice, and the 
presence of these tests ensures that it's not one to make lightly.

(personal response)
Michael Kay
Saxonica


On 25/09/2012 00:51, Paul J. Lucas wrote:
> The following tests from the FOTS:
>
> 	fn-matches-41:
> 	fn:matches( concat( 'Mary', codepoints-to-string(10) ), 'Mary$' )
>
> 	fn-matches-42:
> 	fn:matches( concat( 'Mary', codepoints-to-string(10) ), 'Mary$', 's' )
>
> are listed as returning "false" for the correct answer.  Why?  This doesn't work the way, for example, ICU's regular expressions work; see:
>
> 	http://stackoverflow.com/questions/12566811/
>
> - Paul
>
>
>
Received on Tuesday, 25 September 2012 08:49:12 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:49 UTC