W3C home > Mailing lists > Public > semantic-web@w3.org > May 2013

SPARQL 1.1 REGEX Multichar escape syntax

From: Jeremy J Carroll <jjc@syapse.com>
Date: Wed, 22 May 2013 12:24:31 -0700
Message-Id: <A87FBE90-1EB2-4D93-8B23-637CFCFA7E89@syapse.com>
To: "semantic-web@w3.org" <semantic-web@w3.org>


I am trying to follow the advice of:
http://stackoverflow.com/questions/2397574/how-to-find-a-word-within-text-using-xslt-2-0-and-regex-which-doesnt-have-b-w

and applying it to SPARQL REGEX - specifically the suggestion is to use "(^|\W)" or "($|\W)" instead of "\b"

I am trying to match literals  like
"Accessory cervical lymph node" 
that contain words starting in "lymp" but not literals like
"Endolymphatic duct of right membranous labyrinth"
that do not.

My reading of the specs is that the SPARQL I want is, e.g.:

prefix skos: <http://www.w3.org/2004/02/skos/core#>    
   
select distinct ?o 
where {     
    ?s skos:prefLabel|skos:altLabel ?o.     
    ?s skos:inScheme <http://syapse.com/vocabularies/fma/anatomical_entity#> .
    filter(regex(?o,'\Wlymp','i'))
}
LIMIT 10

(ignoring the initial word issue)

However, the systems I have tried so far (bigdata, bigOWLIM and dydra) all require an additional \
wanting

prefix skos: <http://www.w3.org/2004/02/skos/core#>    
   
select distinct ?o 
where {     
    ?s skos:prefLabel|skos:altLabel ?o.     
    ?s skos:inScheme <http://syapse.com/vocabularies/fma/anatomical_entity#> .
    filter(regex(?o,'\\Wlymp','i'))
}
LIMIT 10


They seem to accept e.g. "\t" rather than "\\t" as a tab.

Is this me misreading the specs, or is it an implementation bug shared between everything I have tried so far  :(

My reading of the spec is that 
SPARQL REGEX defers to
http://www.w3.org/TR/xpath-functions/#func-matches
which for the \W defers to 
http://www.w3.org/TR/xmlschema-2/#dt-regex
which includes 
http://www.w3.org/TR/xmlschema-2/#nt-charClass
http://www.w3.org/TR/xmlschema-2/#nt-charClassEsc
http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc

defining \W (with only one \)


Jeremy J Carroll
Principal Architect
Syapse, Inc.




Received on Wednesday, 22 May 2013 19:25:04 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:33 UTC