- From: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
- Date: Sat, 3 Mar 2007 23:17:09 -0500 (EST)
- To: public-rdf-dawg-comments@w3.org
DAWG WG. I've been wrestling with a very common RDF query pattern for RDF
datasets (as defined formally by SPARQL) where the query is ".. using a
variable to range over the IRIs naming graphs." The example in 9.3.1
Accessing Graph Names is such a scenario:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?src ?bobNick
WHERE
{
GRAPH ?src
{ ?x foaf:mbox <mailto:bob@work.example> .
?x foaf:nick ?bobNick
}
}
A procedural evaluation requires an explicit iteration over the IRIs of
all the named graphs in the dataset. The evaluation semantics outlined in
[SPARQL ALGEBRA] for the GRAPH operator suggests this approach as does
[SEMANTICS OF SPARQL]
Alternatively, if the underlying RDF triple store required the explicit
use of on or more RDF graph IRIs when resolving patterns (there are many
that do), this would (at the very least) require that the list of all IRIs
in a dataset be cached or handy for such scenarios. This 'function' is
defined as name(D) in [SEMANTICS OF SPARQL] and doesn't scale well as the
number of named graphs increases.
Consider a declarative approach instead. This would require some
(semi?)formal 'built-in' [CWM-VOCAB] relation between a triple pattern and
the set of IRIs of the named graph(s) it is asserted in within the
dataset: nGraph:assertedIn. It also can be considered the inverse of
log:includes [CWM-VOCAB] and 'somewhat' (niavely?) similar to the
epistemic operator K [CLOSED WORLD REASONING].
For RDF triple stores which support this function (in their APIs) a
declarative evaluation can resolve the original example query by
evaluating the BGPs only once across the dataset, returning valid
substitutions for the GRAPH variable in addition to the solutions for
other variables in the patterns.
So the Graph(..) evaluation semantics [SPARQL ALGEBRA] could be rewritten
as:
eval(D(G), Graph(var,P)) = eval(D(G),DECL_GRAPH_BINDINGS(var,P))
where
DECL_GRAPH_BINDINGS replaces every BGP operator with one which includes
substitutions for var that satisfy nGraph:assertedIn for all the triples
patterns within.
RDF datasets targeted by queries which produce bindings for graph
names which are later used to restrict subsequent graph graph patterns are
excellent candidates [1] for optimizing the evaluation of GRAPH patterns
and the Graph(..) operator
The query example from 9.3.3 Restricting Possible Graph IRIs [SPARQL] is an example:
PREFIX data: <http://example.org/foaf/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?mbox ?nick ?ppd
WHERE
{
GRAPH data:aliceFoaf
{
?alice foaf:mbox <mailto:alice@work.example> ;
foaf:knows ?whom .
?whom foaf:mbox ?mbox ;
rdfs:seeAlso ?ppd .
?ppd a foaf:PersonalProfileDocument .
} .
GRAPH ?ppd
{
?w foaf:mbox ?mbox ;
foaf:nick ?nick
}
}
Any well linked, semantic web of named RDF graphs (a FOAF network) is also a candidate. This query optimization opportunity scales well with
the size of such datasets. Any guidance on semantics for evaluating GRAPH
operators declaratively (for RDF triples stores which want to support this
'natively') would be much appreciated.
[1] http://copia.ogbuji.net/blog/2006-07-14/querying-named-rdf-graph-aggregate
[2] http://www.w3.org/2001/sw/DataAccess/rq23/rq24-algebra.html
[3] http://ing.utalca.cl/~jperez/papers/sparql_semantics.pdf
[4] http://www.w3.org/2000/10/swap/doc/CwmBuiltins
[5] http://www.mindswap.org/2005/OWLWorkshop/sub12.pdf
Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org
Received on Sunday, 4 March 2007 04:17:20 UTC