Declarative GRAPH evaluation / operators? from Chimezie Ogbuji on 2007-03-04 (public-rdf-dawg-comments@w3.org from March 2007)

From: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
Date: Sat, 3 Mar 2007 23:17:09 -0500 (EST)
To: public-rdf-dawg-comments@w3.org
Message-ID: <Pine.GSO.4.60.0703032310430.18887@joplin.bio.ri.ccf.org>
DAWG WG.  I've been wrestling with a very common RDF query pattern for RDF 
datasets (as defined formally by SPARQL) where the query is ".. using a 
variable to range over the IRIs naming graphs."  The example in 9.3.1 
Accessing Graph Names is such a scenario:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?src ?bobNick
WHERE
   {
     GRAPH ?src
     { ?x foaf:mbox <mailto:bob@work.example> .
       ?x foaf:nick ?bobNick
     }
   }

A procedural evaluation requires an explicit iteration over the IRIs of 
all the named graphs in the dataset.  The evaluation semantics outlined in 
[SPARQL ALGEBRA] for the GRAPH operator suggests this approach as does 
[SEMANTICS OF SPARQL]

Alternatively, if the underlying RDF triple store required the explicit 
use of on or more RDF graph IRIs when resolving patterns (there are many 
that do), this would (at the very least) require that the list of all IRIs 
in a dataset be cached or handy for such scenarios.  This 'function' is 
defined as  name(D) in [SEMANTICS OF SPARQL] and doesn't scale well as the 
number of named graphs increases.

Consider a declarative approach instead.  This would require some 
(semi?)formal 'built-in' [CWM-VOCAB] relation between a triple pattern and 
the set of IRIs of the named graph(s) it is asserted in within the 
dataset: nGraph:assertedIn. It also can be considered the inverse of 
log:includes [CWM-VOCAB] and 'somewhat' (niavely?) similar to the 
epistemic operator K [CLOSED WORLD REASONING].

For RDF triple stores which support this function (in their APIs) a 
declarative evaluation can resolve the original example query by 
evaluating the BGPs only once across the dataset, returning valid 
substitutions for the GRAPH variable in addition to the solutions for 
other variables in the patterns.

So the Graph(..) evaluation semantics [SPARQL ALGEBRA] could be rewritten 
as:

eval(D(G), Graph(var,P)) = eval(D(G),DECL_GRAPH_BINDINGS(var,P))

where

DECL_GRAPH_BINDINGS replaces every BGP operator with one which includes 
substitutions for var that satisfy nGraph:assertedIn for all the triples 
patterns within.

RDF datasets targeted by queries which produce bindings for graph 
names which are later used to restrict subsequent graph graph patterns are 
excellent candidates [1] for optimizing the evaluation of GRAPH patterns 
and the Graph(..) operator

The query example from 9.3.3 Restricting Possible Graph IRIs [SPARQL] is an example:

PREFIX  data:  <http://example.org/foaf/>
PREFIX  foaf:  <http://xmlns.com/foaf/0.1/>
PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?mbox ?nick ?ppd
WHERE
{
   GRAPH data:aliceFoaf
   {
     ?alice foaf:mbox <mailto:alice@work.example> ;
            foaf:knows ?whom .
     ?whom  foaf:mbox ?mbox ;
            rdfs:seeAlso ?ppd .
     ?ppd  a foaf:PersonalProfileDocument .
   } .
   GRAPH ?ppd
   {
       ?w foaf:mbox ?mbox ;
          foaf:nick ?nick
   }
}

Any well linked, semantic web of named RDF graphs (a FOAF network) is also a candidate.  This query optimization opportunity scales well with 
the size of such datasets.  Any guidance on semantics for evaluating GRAPH 
operators declaratively (for RDF triples stores which want to support this 
'natively') would be much appreciated.

[1] http://copia.ogbuji.net/blog/2006-07-14/querying-named-rdf-graph-aggregate
[2] http://www.w3.org/2001/sw/DataAccess/rq23/rq24-algebra.html
[3] http://ing.utalca.cl/~jperez/papers/sparql_semantics.pdf
[4] http://www.w3.org/2000/10/swap/doc/CwmBuiltins
[5] http://www.mindswap.org/2005/OWLWorkshop/sub12.pdf

Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org
Received on Sunday, 4 March 2007 04:17:20 UTC