- From: Jeremy J Carroll <jjc@syapse.com>
- Date: Thu, 23 May 2013 10:51:50 -0700
- To: Paul Gearon <gearon@ieee.org>
- Cc: "semantic-web@w3.org" <semantic-web@w3.org>
- Message-Id: <F90835A7-5E2D-4575-AA4A-09F5D99EFAEF@syapse.com>
Oh thanks, you've convinced me that it was my "misreading the specs" rather than an implementation issue. In SPARQL REGEX the conventional \b" escape should map to "(^|\\W)" or ($|\\W)" with the first \ being to escape the second backslash according to http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#grammarEscapes Jeremy J Carroll Principal Architect Syapse, Inc. On May 22, 2013, at 1:22 PM, Paul Gearon <gearon@ieee.org> wrote: > On Wed, May 22, 2013 at 3:24 PM, Jeremy J Carroll <jjc@syapse.com> wrote: > filter(regex(?o,'\Wlymp','i')) > > vs. > > filter(regex(?o,'\\Wlymp','i')) > > > They seem to accept e.g. "\t" rather than "\\t" as a tab. > > Is this me misreading the specs, or is it an implementation bug shared between everything I have tried so far … :( > > My reading of the spec is that > SPARQL REGEX defers to > http://www.w3.org/TR/xpath-functions/#func-matches > which for the \W defers to > http://www.w3.org/TR/xmlschema-2/#dt-regex > which includes > http://www.w3.org/TR/xmlschema-2/#nt-charClass > http://www.w3.org/TR/xmlschema-2/#nt-charClassEsc > http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc > > Interesting question. I think the "\W" is undefined. > > The second parameter of the regex is a SPARQL string, meaning that the SPARQL parser will read it before providing the string as a parameter to the regex. SPARQL accepts escape sequences according to: > http://www.w3.org/TR/sparql11-query/#grammarEscapes > > "\W" is not an accepted escape sequence, though "\t" is. For a parser to leave a sequence starting with "\" untouched in a string would be very strange (IMO), since the accepted way to achieve a '\' character is with the sequence "\\". Looking at the spec, I don't see anything indicating what should happen to a character preceded by a backslash that is not a known escape character. There may be parsers which leave a "\W" alone in the string, but I expect that this will be implementation dependent. > > To correctly get a sequence of "\W" into the string that is provided to regex you will need a "\\W" as you have found. > > Regards, > Paul Gearon
Received on Thursday, 23 May 2013 17:52:23 UTC