'standardizing' on one or more predicates for text search in SPARQL?

Many SPARQL engines contain support for a magic/computed/functional 
predicate that can be used to relate a literal subject (?o if you will) 
to a text search string.

See http://esw.w3.org/topic/SPARQL/Extensions/Computed_Properties for 
links to some examples.

Right now, different implementations use different predicates. As far as 
I can tell:

ARQ (Jena): http://jena.hpl.hp.com/ARQ/property#
Virtuoso: bif:contains  (though I can't tell what prefix bif: 
corresponds to)
Glitter (Open Anzo): http://openanzo.org/predicates/textmatch
AllegroGraph: http://franz.com/ns/allegrograph/2.2/textindex/match

A couple of questions:

1) What is the search syntax of these predicates? For example, the 
object of Glitter's textmatch is a Lucene search string. I think (but am 
not sure) that ARQ is the same, and I'm not sure about the others.

2) Do we have any hope of reconciling these to promote more 
interoperable queries of this sort? At the least, are implementors 
willing to support all 4 of these predicates (and perhaps others) 
interchangeably?

3) Is there any value in coining an "implementation-independent" URI for 
textsearch and adding that to existing implementations?

4) Do existing implementations compile simple invocations of the SPARQL 
regex filter function into uses of text-search indexes? Is regex(...) 
the best way to interoperably _and_ efficiently perform SPARQL text 
match queries? (This has come to light in the recent Berlin benchmark 
SPARQL queries.)


 From my point of view as an implementor, I'd be happy to support other 
predicates and/or an agreed upon implementation-neutral predicate in 
Glitter, though I'd want to be clear on the syntax of the search string 
itself. Glitter doesn't currently compile regex(...) into 
anzo:textmatch, but I've been intending to add that support in the light 
of the Berlin query benchmark suite.

Lee

Received on Saturday, 16 August 2008 20:51:11 UTC