- From: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
- Date: Sat, 3 Mar 2007 23:17:09 -0500 (EST)
- To: public-rdf-dawg-comments@w3.org
DAWG WG. I've been wrestling with a very common RDF query pattern for RDF datasets (as defined formally by SPARQL) where the query is ".. using a variable to range over the IRIs naming graphs." The example in 9.3.1 Accessing Graph Names is such a scenario: PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?src ?bobNick WHERE { GRAPH ?src { ?x foaf:mbox <mailto:bob@work.example> . ?x foaf:nick ?bobNick } } A procedural evaluation requires an explicit iteration over the IRIs of all the named graphs in the dataset. The evaluation semantics outlined in [SPARQL ALGEBRA] for the GRAPH operator suggests this approach as does [SEMANTICS OF SPARQL] Alternatively, if the underlying RDF triple store required the explicit use of on or more RDF graph IRIs when resolving patterns (there are many that do), this would (at the very least) require that the list of all IRIs in a dataset be cached or handy for such scenarios. This 'function' is defined as name(D) in [SEMANTICS OF SPARQL] and doesn't scale well as the number of named graphs increases. Consider a declarative approach instead. This would require some (semi?)formal 'built-in' [CWM-VOCAB] relation between a triple pattern and the set of IRIs of the named graph(s) it is asserted in within the dataset: nGraph:assertedIn. It also can be considered the inverse of log:includes [CWM-VOCAB] and 'somewhat' (niavely?) similar to the epistemic operator K [CLOSED WORLD REASONING]. For RDF triple stores which support this function (in their APIs) a declarative evaluation can resolve the original example query by evaluating the BGPs only once across the dataset, returning valid substitutions for the GRAPH variable in addition to the solutions for other variables in the patterns. So the Graph(..) evaluation semantics [SPARQL ALGEBRA] could be rewritten as: eval(D(G), Graph(var,P)) = eval(D(G),DECL_GRAPH_BINDINGS(var,P)) where DECL_GRAPH_BINDINGS replaces every BGP operator with one which includes substitutions for var that satisfy nGraph:assertedIn for all the triples patterns within. RDF datasets targeted by queries which produce bindings for graph names which are later used to restrict subsequent graph graph patterns are excellent candidates [1] for optimizing the evaluation of GRAPH patterns and the Graph(..) operator The query example from 9.3.3 Restricting Possible Graph IRIs [SPARQL] is an example: PREFIX data: <http://example.org/foaf/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?mbox ?nick ?ppd WHERE { GRAPH data:aliceFoaf { ?alice foaf:mbox <mailto:alice@work.example> ; foaf:knows ?whom . ?whom foaf:mbox ?mbox ; rdfs:seeAlso ?ppd . ?ppd a foaf:PersonalProfileDocument . } . GRAPH ?ppd { ?w foaf:mbox ?mbox ; foaf:nick ?nick } } Any well linked, semantic web of named RDF graphs (a FOAF network) is also a candidate. This query optimization opportunity scales well with the size of such datasets. Any guidance on semantics for evaluating GRAPH operators declaratively (for RDF triples stores which want to support this 'natively') would be much appreciated. [1] http://copia.ogbuji.net/blog/2006-07-14/querying-named-rdf-graph-aggregate [2] http://www.w3.org/2001/sw/DataAccess/rq23/rq24-algebra.html [3] http://ing.utalca.cl/~jperez/papers/sparql_semantics.pdf [4] http://www.w3.org/2000/10/swap/doc/CwmBuiltins [5] http://www.mindswap.org/2005/OWLWorkshop/sub12.pdf Chimezie Ogbuji Lead Systems Analyst Thoracic and Cardiovascular Surgery Cleveland Clinic Foundation 9500 Euclid Avenue/ W26 Cleveland, Ohio 44195 Office: (216)444-8593 ogbujic@ccf.org
Received on Sunday, 4 March 2007 04:17:20 UTC