- From: Paul Gearon <gearon@ieee.org>
- Date: Wed, 22 May 2013 16:22:04 -0400
- To: Jeremy J Carroll <jjc@syapse.com>
- Cc: "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <CAGZNPFnb7HRgi+ie4DOdg5sBbRMBZ73zSkM8aaUq9B-pHheQsg@mail.gmail.com>
On Wed, May 22, 2013 at 3:24 PM, Jeremy J Carroll <jjc@syapse.com> wrote: > filter(regex(?o,'\Wlymp','i')) > vs. > filter(regex(?o,'\\Wlymp','i')) > > They seem to accept e.g. "\t" rather than "\\t" as a tab. > > Is this me misreading the specs, or is it an implementation bug shared > between everything I have tried so far … :( > > My reading of the spec is that > SPARQL REGEX defers to > http://www.w3.org/TR/xpath-functions/#func-matches > which for the \W defers to > http://www.w3.org/TR/xmlschema-2/#dt-regex > which includes > http://www.w3.org/TR/xmlschema-2/#nt-charClass > http://www.w3.org/TR/xmlschema-2/#nt-charClassEsc > http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc > Interesting question. I think the "\W" is undefined. The second parameter of the regex is a SPARQL string, meaning that the SPARQL parser will read it before providing the string as a parameter to the regex. SPARQL accepts escape sequences according to: http://www.w3.org/TR/sparql11-query/#grammarEscapes "\W" is not an accepted escape sequence, though "\t" is. For a parser to leave a sequence starting with "\" untouched in a string would be very strange (IMO), since the accepted way to achieve a '\' character is with the sequence "\\". Looking at the spec, I don't see anything indicating what should happen to a character preceded by a backslash that is not a known escape character. There may be parsers which leave a "\W" alone in the string, but I expect that this will be implementation dependent. To correctly get a sequence of "\W" into the string that is provided to regex you will need a "\\W" as you have found. Regards, Paul Gearon
Received on Wednesday, 22 May 2013 20:22:37 UTC