Querying "all graphs" (was: Re: Graph to retrieve DESCRIBE result from)

Chimezie Ogbuji wrote:
> On 3/22/09 3:24 AM, "Lee Feigenbaum" <lee@thefigtrees.net> wrote:
>> As far as I can tell, if the RDF dataset for the query includes all of
>> the graphs that the engine knows about as named graphs, then this query
>> will work.
>>
>> Different implementations have different ways of expressing "query
>> against a dataset consisting of all named graphs you know about". Some
>> implementations do this by default when no other dataset is specified;
>> others use a "magic" URI to indicate "all graphs".
> 
> I think the fact that multiple implementations support this capability in
> different ways is evidence that this behavior is needed and should be
> standardized.
>  
>> It's been suggested multiple times in the past that SPARQL have a "FROM
>> NAMED *" construct that would explicitly ask that an engine query over
>> all known graphs.
>>
>> The problem that I have always had with this suggestion is that the very
>> concept of "all known graphs" is not one that the SPARQL specification
>> defines (nor do I really have any idea how one would begin to define it).
> 
> Unless I'm missing something, this would be straight forward.  

One of us is missing something :-)

> SPARQL
> evaluation is always defined with respect to a dataset:
> 
> eval(D(G), graph pattern)

Right, the data set (the universe of graphs against which a query is 
executed) is specified in one of three ways:

1) via the protocol (default-graph-uri and named-graph-uri in an HTTP 
request, or often via an API)

2) via the query itself (via one or more FROM and FROM NAMED clauses)

3) implementation-defined

> The evaluation of GRAPH ?someVar effectively gives us a mechanism for
> determining "all known named graphs" for *the* dataset relevant to the
> query.

Right, but this just scopes part of the query to the dataset, which has 
already been defined as above. Unless I misunderstand, the feature in 
question is how does a SPARQL user specify that they want to query 
against "all the graphs that the engine could possibly query". In other 
words, this isn't a matter of specifying which subset of the dataset 
should be queried; rather it's a challenge of specifying which dataset 
should be queried.

> eval(D(G), Graph(var,P)) =
>      Let R be the empty multiset
>      foreach IRI i in D
>         R := Union(R, Join( eval(D(D[i]), P) , ‡(?var->i) )
>      the result is R
> 
> The 'foreach IRI I in D' is the key part that allows us to enumerate the
> named graphs in a dataset.
> 
> So a FROM NAMED * would just be a short hand for a similar mechanism but
> without returning mappings to i.  Essentially, it is a dataset compile-time
> modification that replaces the default graph (for the purpose of the current
> query evaluation only) with the union / merge  of all the named graphs in
> the dataset.  Since basic graph patterns match the active graph and the
> default graph of an RDF dataset is the active graph (if one isn't specified
> by a parent GRAPH pattern), then the result is as expected: the query ranges
> over all known graphs.

This sounds like a different feature to me: this sounds like asking for 
some way to treat the named graphs in a data set as a single graph. But 
I don't see any reason for that since you can just use the default graph 
for that - since you already needed to have some way to define the named 
graphs in your data set as containing "all graphs", you could just as 
easily define the default graph as containing "all graphs".

> Of course, this can be achieved in practice via explicit use of a top-level
> GRAPH ?var.

Well, that's a bit different since everything inside GRAPH ?var { ... } 
needs to match against a single graph from the named graph part of the 
data set.

Does this make any sense?

Happy to continue discussion over email or on IRC,
Lee

>> Nevertheless, if an advocate of this feature can explain what it would
>> mean and would like to champion it as a potential feature for the group
>> to consider working on, please go ahead!
> 
> I know this feature is critical for using SPARQL as the basis for
> population-based querying and would be glad to add the feature (though it
> seems currently already mentioned in one or two other features).
> 
> ----------------------
> Chimezie (chee-meh) Thomas-Ogbuji (oh-bu-gee)
> Heart and Vascular Institute (Clinical Investigations)
> Cleveland Clinic (ogbujic@ccf.org)
> Ph.D. Student Case Western Reserve University
> (chimezie.thomas-ogbuji@case.edu)
> 
> 
> ===================================
> 
> P Please consider the environment before printing this e-mail
> 
> Cleveland Clinic is ranked one of the top hospitals
> in America by U.S. News & World Report (2008).  
> Visit us online at http://www.clevelandclinic.org for
> a complete listing of our services, staff and
> locations.
> 
> 
> Confidentiality Note:  This message is intended for use
> only by the individual or entity to which it is addressed
> and may contain information that is privileged,
> confidential, and exempt from disclosure under applicable
> law.  If the reader of this message is not the intended
> recipient or the employee or agent responsible for
> delivering the message to the intended recipient, you are
> hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited.  If
> you have received this communication in error,  please
> contact the sender immediately and destroy the material in
> its entirety, whether electronic or hard copy.  Thank you.
> 
> 

Received on Tuesday, 24 March 2009 13:43:29 UTC