W3C home > Mailing lists > Public > semantic-web@w3.org > May 2013

Re: SPARQL 1.1 REGEX Multichar escape syntax

From: Paul Gearon <gearon@ieee.org>
Date: Wed, 22 May 2013 16:22:04 -0400
Message-ID: <CAGZNPFnb7HRgi+ie4DOdg5sBbRMBZ73zSkM8aaUq9B-pHheQsg@mail.gmail.com>
To: Jeremy J Carroll <jjc@syapse.com>
Cc: "semantic-web@w3.org" <semantic-web@w3.org>
On Wed, May 22, 2013 at 3:24 PM, Jeremy J Carroll <jjc@syapse.com> wrote:

>     filter(regex(?o,'\Wlymp','i'))


>     filter(regex(?o,'\\Wlymp','i'))

> They seem to accept e.g. "\t" rather than "\\t" as a tab.
> Is this me misreading the specs, or is it an implementation bug shared
> between everything I have tried so far … :(
> My reading of the spec is that
> SPARQL REGEX defers to
> http://www.w3.org/TR/xpath-functions/#func-matches
> which for the \W defers to
> http://www.w3.org/TR/xmlschema-2/#dt-regex
> which includes
> http://www.w3.org/TR/xmlschema-2/#nt-charClass
> http://www.w3.org/TR/xmlschema-2/#nt-charClassEsc
> http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc

Interesting question. I think the "\W" is undefined.

The second parameter of the regex is a SPARQL string, meaning that the
SPARQL parser will read it before providing the string as a parameter to
the regex. SPARQL accepts escape sequences according to:

"\W" is not an accepted escape sequence, though "\t" is. For a parser to
leave a sequence starting with "\" untouched in a string would be very
strange (IMO), since the accepted way to achieve a '\' character is with
the sequence "\\". Looking at the spec, I don't see anything indicating
what should happen to a character preceded by a backslash that is not a
known escape character. There may be parsers which leave a "\W" alone in
the string, but I expect that this will be implementation dependent.

To correctly get a sequence of "\W" into the string that is provided to
regex you will need a "\\W" as you have found.

Paul Gearon
Received on Wednesday, 22 May 2013 20:22:37 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:33 UTC