Re: SPARQL 1.1 REGEX Multichar escape syntax

Oh thanks, you've convinced me that it was my "misreading the specs" rather than an implementation issue.

In SPARQL REGEX the conventional \b" escape should map to "(^|\\W)" or ($|\\W)" with the first \ being to escape the second backslash according to
http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#grammarEscapes

Jeremy J Carroll
Principal Architect
Syapse, Inc.



On May 22, 2013, at 1:22 PM, Paul Gearon <gearon@ieee.org> wrote:

> On Wed, May 22, 2013 at 3:24 PM, Jeremy J Carroll <jjc@syapse.com> wrote:
>     filter(regex(?o,'\Wlymp','i'))
> 
> vs.
>  
>     filter(regex(?o,'\\Wlymp','i'))
> 
>  
> They seem to accept e.g. "\t" rather than "\\t" as a tab.
> 
> Is this me misreading the specs, or is it an implementation bug shared between everything I have tried so far … :(
> 
> My reading of the spec is that 
> SPARQL REGEX defers to
> http://www.w3.org/TR/xpath-functions/#func-matches
> which for the \W defers to 
> http://www.w3.org/TR/xmlschema-2/#dt-regex
> which includes 
> http://www.w3.org/TR/xmlschema-2/#nt-charClass
> http://www.w3.org/TR/xmlschema-2/#nt-charClassEsc
> http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc
> 
> Interesting question. I think the "\W" is undefined.
> 
> The second parameter of the regex is a SPARQL string, meaning that the SPARQL parser will read it before providing the string as a parameter to the regex. SPARQL accepts escape sequences according to:
> http://www.w3.org/TR/sparql11-query/#grammarEscapes
> 
> "\W" is not an accepted escape sequence, though "\t" is. For a parser to leave a sequence starting with "\" untouched in a string would be very strange (IMO), since the accepted way to achieve a '\' character is with the sequence "\\". Looking at the spec, I don't see anything indicating what should happen to a character preceded by a backslash that is not a known escape character. There may be parsers which leave a "\W" alone in the string, but I expect that this will be implementation dependent.
> 
> To correctly get a sequence of "\W" into the string that is provided to regex you will need a "\\W" as you have found.
> 
> Regards,
> Paul Gearon

Received on Thursday, 23 May 2013 17:52:23 UTC