Re: Declarative GRAPH evaluation / operators? from Richard Newman on 2007-03-04 (public-rdf-dawg-comments@w3.org from March 2007)

From: Richard Newman <r.newman@reading.ac.uk>
Date: Sun, 4 Mar 2007 12:11:44 -0800
To: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
Cc: public-rdf-dawg-comments@w3.org
Message-Id: <FBC86D21-4873-4BC2-B202-E37A4CCB4A68@reading.ac.uk>

Hi Chimezie,

   I'll answer your email in two parts :)

> Any well linked, semantic web of named RDF graphs (a FOAF network)  
> is also a candidate.  This query optimization opportunity scales  
> well with the size of such datasets.  Any guidance on semantics for  
> evaluating GRAPH operators declaratively (for RDF triples stores  
> which want to support this 'natively') would be much appreciated.

Here's my point of view.

A broad swathe of triple stores include a quad field in a triple,  
such that you can do a pattern retrieval including a graph:

   (?s ?p ?o ?g).

The SPARQL GRAPH clause expands naturally into quads: every contained  
triple pattern becomes a quad pattern, where the graph place is  
filled by the graph or variable mentioned in the clause.

For example:

>     GRAPH ?src
>     { ?x foaf:mbox <mailto:bob@work.example> .
>       ?x foaf:nick ?bobNick
>     }

=>
( ?x foaf:mbox <mailto:bob@work.example> ?src )
( ?x foaf:nick ?bobNick ?src )

For AllegroGraph, this was all I did; you can plan and execute these  
patterns, and the right bindings are generated. I've done literally  
nothing else, and those GRAPH queries work. Furthermore, the graph  
part is indexed, just like the rest, so these patterns are fast...  
and there's nothing procedural about it.

Now, back to the start of the email!

> A procedural evaluation requires an explicit iteration over the  
> IRIs of all the named graphs in the dataset.  The evaluation  
> semantics outlined in [SPARQL ALGEBRA] for the GRAPH operator  
> suggests this approach as does [SEMANTICS OF SPARQL]
>
> Alternatively, if the underlying RDF triple store required the  
> explicit use of on or more RDF graph IRIs when resolving patterns  
> (there are many that do), this would (at the very least) require  
> that the list of all IRIs in a dataset be cached or handy for such  
> scenarios.  This 'function' is defined as  name(D) in [SEMANTICS OF  
> SPARQL] and doesn't scale well as the number of named graphs  
> increases.

These problems arise because the index (or lack thereof) of graph  
URIs is not related to any indices for subject, predicate, and object  
places. E.g., if you could query on g just as easily as on s, p, o,  
and combinations of them, this problem just doesn't exist.

In other words: you could just as easily say that evaluating a triple  
pattern requires an explicit iteration over the subjects, predicates,  
and objects in a dataset. Implementers solve this through indexing.

> Consider a declarative approach instead.  This would require some  
> (semi?)formal 'built-in' [CWM-VOCAB] relation between a triple  
> pattern and the set of IRIs of the named graph(s) it is asserted in  
> within the dataset: nGraph:assertedIn. It also can be considered  
> the inverse of log:includes [CWM-VOCAB] and 'somewhat' (niavely?)  
> similar to the epistemic operator K [CLOSED WORLD REASONING].

... and now you need a way to uniquely identify triple patterns. You  
might as well directly store their containing graph instead of (or as  
well as) an identifier.

> For RDF triple stores which support this function (in their APIs) a  
> declarative evaluation can resolve the original example query by  
> evaluating the BGPs only once across the dataset, returning valid  
> substitutions for the GRAPH variable in addition to the solutions  
> for other variables in the patterns.

As I mentioned above, there are other ways to reach this goal.

-R

Received on Sunday, 4 March 2007 20:12:05 UTC