- From: Richard Newman <r.newman@reading.ac.uk>
- Date: Sun, 4 Mar 2007 12:11:44 -0800
- To: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
- Cc: public-rdf-dawg-comments@w3.org
Hi Chimezie, I'll answer your email in two parts :) > Any well linked, semantic web of named RDF graphs (a FOAF network) > is also a candidate. This query optimization opportunity scales > well with the size of such datasets. Any guidance on semantics for > evaluating GRAPH operators declaratively (for RDF triples stores > which want to support this 'natively') would be much appreciated. Here's my point of view. A broad swathe of triple stores include a quad field in a triple, such that you can do a pattern retrieval including a graph: (?s ?p ?o ?g). The SPARQL GRAPH clause expands naturally into quads: every contained triple pattern becomes a quad pattern, where the graph place is filled by the graph or variable mentioned in the clause. For example: > GRAPH ?src > { ?x foaf:mbox <mailto:bob@work.example> . > ?x foaf:nick ?bobNick > } => ( ?x foaf:mbox <mailto:bob@work.example> ?src ) ( ?x foaf:nick ?bobNick ?src ) For AllegroGraph, this was all I did; you can plan and execute these patterns, and the right bindings are generated. I've done literally nothing else, and those GRAPH queries work. Furthermore, the graph part is indexed, just like the rest, so these patterns are fast... and there's nothing procedural about it. Now, back to the start of the email! > A procedural evaluation requires an explicit iteration over the > IRIs of all the named graphs in the dataset. The evaluation > semantics outlined in [SPARQL ALGEBRA] for the GRAPH operator > suggests this approach as does [SEMANTICS OF SPARQL] > > Alternatively, if the underlying RDF triple store required the > explicit use of on or more RDF graph IRIs when resolving patterns > (there are many that do), this would (at the very least) require > that the list of all IRIs in a dataset be cached or handy for such > scenarios. This 'function' is defined as name(D) in [SEMANTICS OF > SPARQL] and doesn't scale well as the number of named graphs > increases. These problems arise because the index (or lack thereof) of graph URIs is not related to any indices for subject, predicate, and object places. E.g., if you could query on g just as easily as on s, p, o, and combinations of them, this problem just doesn't exist. In other words: you could just as easily say that evaluating a triple pattern requires an explicit iteration over the subjects, predicates, and objects in a dataset. Implementers solve this through indexing. > Consider a declarative approach instead. This would require some > (semi?)formal 'built-in' [CWM-VOCAB] relation between a triple > pattern and the set of IRIs of the named graph(s) it is asserted in > within the dataset: nGraph:assertedIn. It also can be considered > the inverse of log:includes [CWM-VOCAB] and 'somewhat' (niavely?) > similar to the epistemic operator K [CLOSED WORLD REASONING]. ... and now you need a way to uniquely identify triple patterns. You might as well directly store their containing graph instead of (or as well as) an identifier. > For RDF triple stores which support this function (in their APIs) a > declarative evaluation can resolve the original example query by > evaluating the BGPs only once across the dataset, returning valid > substitutions for the GRAPH variable in addition to the solutions > for other variables in the patterns. As I mentioned above, there are other ways to reach this goal. -R
Received on Sunday, 4 March 2007 20:12:05 UTC