Re: SPARQL 1.1 REGEX Multichar escape syntax

On Wed, May 22, 2013 at 3:24 PM, Jeremy J Carroll <jjc@syapse.com> wrote:

>     filter(regex(?o,'\Wlymp','i'))
>

vs.


>     filter(regex(?o,'\\Wlymp','i'))
>



> They seem to accept e.g. "\t" rather than "\\t" as a tab.
>
> Is this me misreading the specs, or is it an implementation bug shared
> between everything I have tried so far … :(
>
> My reading of the spec is that
> SPARQL REGEX defers to
> http://www.w3.org/TR/xpath-functions/#func-matches
> which for the \W defers to
> http://www.w3.org/TR/xmlschema-2/#dt-regex
> which includes
> http://www.w3.org/TR/xmlschema-2/#nt-charClass
> http://www.w3.org/TR/xmlschema-2/#nt-charClassEsc
> http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc
>

Interesting question. I think the "\W" is undefined.

The second parameter of the regex is a SPARQL string, meaning that the
SPARQL parser will read it before providing the string as a parameter to
the regex. SPARQL accepts escape sequences according to:
http://www.w3.org/TR/sparql11-query/#grammarEscapes

"\W" is not an accepted escape sequence, though "\t" is. For a parser to
leave a sequence starting with "\" untouched in a string would be very
strange (IMO), since the accepted way to achieve a '\' character is with
the sequence "\\". Looking at the spec, I don't see anything indicating
what should happen to a character preceded by a backslash that is not a
known escape character. There may be parsers which leave a "\W" alone in
the string, but I expect that this will be implementation dependent.

To correctly get a sequence of "\W" into the string that is provided to
regex you will need a "\\W" as you have found.

Regards,
Paul Gearon

Received on Wednesday, 22 May 2013 20:22:37 UTC