RE: Querying "all graphs" from Seaborne, Andy on 2009-03-30 (public-rdf-dawg@w3.org from January to March 2009)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 30 Mar 2009 16:40:03 +0000
To: Lee Feigenbaum <lee@thefigtrees.net>, Chimezie Ogbuji <ogbujic@ccf.org>
CC: Kjetil Kjernsmo <Kjetil.Kjernsmo@computas.com>, 'RDF Data Access Working Group' <public-rdf-dawg@w3.org>
Message-ID: <B6CF1054FDC8B845BF93A6645D19BEA3628DBCC2A2@GVW1118EXC.americas.hpqcorp.net>


> -----Original Message-----
> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
> On Behalf Of Lee Feigenbaum
> Sent: 30 March 2009 17:17
> To: Chimezie Ogbuji
> Cc: Kjetil Kjernsmo; 'RDF Data Access Working Group'
> Subject: Re: Querying "all graphs"
> 
> Chimezie Ogbuji wrote:
> > I think I'm going to break this out into a separate feature request and
> try
> > to better articulate the problem and suggested solutions before we run out
> > of steam in our current feature review.
> 
> That would be great - reading below, I think we're still not
> communicating well with each other, so rather than keep at it, I'd like
> to see this feature request articulated so I can understand it better.
> 
> For now, I see 3 potential distinct features here:
> 
> 1/ A way for users to explicitly write a query that defines the default
> graph component of the RDF dataset as comprising "the RDF merge of all
> graphs that the SPARQL engine knows about". This is what's been referred
> to at times in the past as "FROM *", and is the one that I am
> sympathetic too but have trouble imagining how it would be specified.
> 
> 2/ A way for users to explicitly specify that the named graphs component
> of the RDF dataset should be merged and used as the default graph of the
> RDF dataset. I don't really understand what this would gain, since to
> get to this point you'd already have needed to somehow specify (e.g. via
> FROM NAMED) the relevant graphs that should be named graphs in the RDF
> dataset, and you can use that same mechanism (e.g. FROM) to stick those
> graphs' contents into the default graph part of the RDF dataset.
> 
> 3/ A way for users to refer to RDF datasets by name. I wrote about how
> we deal with this in Open Anzo (via "named datasets") here:
> http://www.thefigtrees.net/lee/blog/2009/03/named_graphs_in_open_anzo.html

> I'm pretty happy with this approach but don't personally think it's ripe
> for standardization.
> 
> Lee


A couple of Jena stores provide the feature of providing the RDF merge of all the named graphs in the dataset.  This can be accessed via a URI (there's no reason why a graph can't be in the dataset in different ways under different names) or the engine can be told to make default graph the RDF merge of named graphs.  This is a property of the SPARQL service being offered.

The computer graph is a strict RDF graph - the computed merge masks duplicates.

In fact the "real" default graph is also available via a named graph with a particular URI even if the query default graph is the RDF merge of named graphs.

Eg.

http://jena.hpl.hp.com/wiki/TDB/Datasets


I'm not sure this is the best way to do it - it is a way to do with without needing to change SPARQL.

The thing I see as missing to what I understand Chimezie describes, is the ability for the client to ask for the service to reconfigure for this mode.  At the moment, the solution is one where it's a feature of what the service offers.

So it covers 2 (via configuration of the service or query processor) and 3 but not 1.  It's more for the case where FROM/FROM NAMED are not being used and it's about what the service offers.

 Andy


> 
> >> Right, but this just scopes part of the query to the dataset, which has
> >> already been defined as above. Unless I misunderstand, the feature in
> >> question is how does a SPARQL user specify that they want to query
> >> against "all the graphs that the engine could possibly query".
> >
> > The feature I had in mind was "how does the user specify that they want to
> > query against all the named graphs of the specified dataset" preferably as
> a
> > default graph. So, it is exactly about specifying which subset of the
> > dataset should be queried and how to carve up such subsets and (possibly)
> > refer to them by name.
> >
> >> This sounds like a different feature to me: this sounds like asking for
> >> some way to treat the named graphs in a data set as a single graph. But
> >> I don't see any reason for that since you can just use the default graph
> >> for that - since you already needed to have some way to define the named
> >> graphs in your data set as containing "all graphs", you could just as
> >> easily define the default graph as containing "all graphs".
> >
> > Right, but currently if this is not specified by the user (in some way,
> > currently FROM ... is the only way) then the server can provide anything
> for
> > the default graph (including an empty graph , which is what the tests in
> the
> > latest test suite sanction).
> >
> > The motivation here is that an empty default graph is not quite as useful
> as
> > a default graph that (either by default or by specific instruction from
> the
> > user) is instead the merge of all the named graphs and it would be nice if
> > there was an explicit way to specify this w/out relying on the
> applications
> > behavior which could differ between systems.
> >
> >> Well, that's a bit different since everything inside GRAPH ?var { ... }
> >> needs to match against a single graph from the named graph part of the
> >> data set.
> >
> > Yes, I realize that now.  Which means that even this would not work as a
> > workaround for the need described above (unless the the desired behavior
> is
> > to match against a single graph from the named graph).
> >
Received on Monday, 30 March 2009 16:41:15 UTC