- From: Jeremy J Carroll <jjc@syapse.com>
- Date: Wed, 22 May 2013 12:24:31 -0700
- To: "semantic-web@w3.org" <semantic-web@w3.org>
- Message-Id: <A87FBE90-1EB2-4D93-8B23-637CFCFA7E89@syapse.com>
I am trying to follow the advice of:
http://stackoverflow.com/questions/2397574/how-to-find-a-word-within-text-using-xslt-2-0-and-regex-which-doesnt-have-b-w
and applying it to SPARQL REGEX - specifically the suggestion is to use "(^|\W)" or "($|\W)" instead of "\b"
I am trying to match literals like
"Accessory cervical lymph node"
that contain words starting in "lymp" but not literals like
"Endolymphatic duct of right membranous labyrinth"
that do not.
My reading of the specs is that the SPARQL I want is, e.g.:
prefix skos: <http://www.w3.org/2004/02/skos/core#>
select distinct ?o
where {
?s skos:prefLabel|skos:altLabel ?o.
?s skos:inScheme <http://syapse.com/vocabularies/fma/anatomical_entity#> .
filter(regex(?o,'\Wlymp','i'))
}
LIMIT 10
(ignoring the initial word issue)
However, the systems I have tried so far (bigdata, bigOWLIM and dydra) all require an additional \
wanting
prefix skos: <http://www.w3.org/2004/02/skos/core#>
select distinct ?o
where {
?s skos:prefLabel|skos:altLabel ?o.
?s skos:inScheme <http://syapse.com/vocabularies/fma/anatomical_entity#> .
filter(regex(?o,'\\Wlymp','i'))
}
LIMIT 10
They seem to accept e.g. "\t" rather than "\\t" as a tab.
Is this me misreading the specs, or is it an implementation bug shared between everything I have tried so far … :(
My reading of the spec is that
SPARQL REGEX defers to
http://www.w3.org/TR/xpath-functions/#func-matches
which for the \W defers to
http://www.w3.org/TR/xmlschema-2/#dt-regex
which includes
http://www.w3.org/TR/xmlschema-2/#nt-charClass
http://www.w3.org/TR/xmlschema-2/#nt-charClassEsc
http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc
defining \W (with only one \)
Jeremy J Carroll
Principal Architect
Syapse, Inc.
Received on Wednesday, 22 May 2013 19:25:04 UTC