Re: Querying only the default graph from the data store from Barry Bishop on 2012-09-05 (public-rdf-dawg-comments@w3.org from September 2012)

From: Barry Bishop <barry.bishop@ontotext.com>
Date: Wed, 05 Sep 2012 19:49:20 +0200
To: "Polleres, Axel" <axel.polleres@siemens.com>
CC: "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>
Message-ID: <504790A0.3070702@ontotext.com>
Hello Axel,

Thanks for taking the time to reply. I realise this thread is somewhat 
out of place given the status/progress of the WG.

Your reply does address my initial post. It does not resolve it, but 
this is perhaps not the time. However, for the purpose of clarity I will 
make further comments inline:

On 05/09/12 04:11, Polleres, Axel wrote:
> Hi Barry,
>
> This is in response to http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Aug/0011.html
>
>> The working draft does not specify how the RDF dataset is constructed
>> when no FROM and FROM NAMED clauses are present in the SPARQL query.
>>
>> Implementations are therefore able to construct the dataset differently,
>> e.g.
>> a. dataset default graph contains only the data store's default graph
>> b. dataset default graph contains the RDF merge of all graphs in the
>> data store
> It is correct that how the concrete default dataset of a SPARQL endpoint is conctructed is left open to implementations. Since different endpoints and implementations support different behaviours in this regard (e.g. in some implementations the default graph of the default dataset is the union of all named graphs whereas in others this is not the case), the working group does not feel that there is a unique standard behavior to be advocated this time around.

I feel this is a shame, as two different implementations can produce 
different output from the simplest of queries, e.g. SELECT * { ?s ?p ?o }

However, this is a separate issue.

>
>> As soon as a single FROM or FROM NAMED clause is used then the data
>> store's default graph is excluded from the query's dataset.
>>
>> Which means that there is no portable way to defne a SPARQL query so
>> that it executes only against the default graph in the data store - or
>> even against a combination of the default graph and one or more named
>> graphs.
> Please note that a) querying the default graph in the datastore is the standard behavior when no explicit FROM or FROM NAMED clauses are given. b) the combination of querying named graphs and the default graph of the endpoint's default dataset is supported via GRAPH graph patterns.

a) This is rather inconsistent. Above you say that the construction of 
the default RDF dataset (when no FROM/FROM NAMED clauses are given) is 
not defined, but here you say constructing it using the default graph 
only is the 'standard behaviour'. One of the motivations for this post 
is that there are good reasons not to have only the default graph in the 
'default dataset', e.g. you wouldn't be able to do this to find out the 
graph names when presented with an unknown endpoint:

SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o } }

Anyway, the point here is that there is no *portable* way to query just 
the default graph.

b) yes, but you can't query the RDF merge of the default graph and a 
named graph in the same way with two named graphs, e.g. FROM ex:g1 FROM 
ex:g2. Instead one would need to use a triple and graph pattern union, 
which for complex queries becomes cumbersome. Put another way, any 
combination of named graphs can be merged and explored with query triple 
patterns, but this can't be done with any combination of named graphs 
and the default graph.


>
> See also examples below.
>
>> This is a problem that often confuses users of RDF data stores
>> and is likely to lead to implementations that provide their own specific
>> means to achieve this, e.g. http://www.openrdf.org/issues/browse/SES-850
>>
>> Inspired by the update language's use of the 'DEFAULT' keyword for graph
>> manipulation, I suggest an extension to the query language that allows
>> "FROM DEFAULT" to be used, e.g.
>>
>> SELECT *
>> FROM DEFAULT
>> WHERE { ..... }
>>
>> => dataset contains a default graph made up of the data store's default
>> graph only
> Please note that this the standard behaviour when no FROM clause is given, i.e. this corresponds to
>
> SELECT *
> WHERE { ..... }       <--- (no use of GRAPH keyword)

I don't think this is "standard behaviour", rather it is common 
behaviour. It can not be standard when the construction of the dataset 
is implementation dependent when no FROM clause is given.

>
>> This construct can be used with any number of FROM <uri> or FROM NAMED
>> <uri> clauses, e.g.
>>
>> SELECT *
>> FROM DEFAULT
>> FROM <http://example.com#g1>
>> WHERE { ..... }
>>
>> => dataset contains a default graph made up of the data store's default
>> graph merged with the contents of the data store's g1 graph
>> This would be a fairly trivial change for exisiting sparql processor
>> implementations, but would provide a big improvement in
>> functionality/flexibility by allowing a data store's default graph to be
>> used/queried/merged in the same way as any of it's named graphs.
> Note that similar to the example above, you can query the default graph and named graphs within the default dataset in a data store side by side by using GRAPH graph patterns, i.e.
>
>   SELECT *
>   WHERE
>   {
>     .....                              <-- (no use of GRAPH) matches the default graph
>     GRAPH <http://ex.com#g1> { .... }  <-- matches named graph g1 (assuming g1 is a named graph in the default dataset)
>   }

Consider an application that needs to execute queries over various 
subsets of a database's contents, where the subsets are defined using 
various combinations of named graphs. It would certainly be useful to 
have standard queries which only required the appropriate "FROM g1 FROM 
g2 etc" prepended. This is easy to do, unless one of the graphs is the 
default graph.

>
> Finally, note that it is not possible in SPARQL1.1 to construct a *new* dataset composed of *parts* of the default dataset of an endpoint plus possible external graphs; such a feature currently not foreseen in the features addressed in this round of SPARQL, but had been suggested before [1].
>
> The features being worked on in this round of standardization have been decided in a voting process at the beginning of the WG and are documented in the following document: http://www.w3.org/TR/sparql-features/
>
> Additionally, a list of work items and features postponed to a future working group are being collected by the group in a dedicated wiki page [2] which also contains the features discussed in the beginning of the WG which have not been considered for this round [3].

Yes, I will be more timely next time and will endeavour to progress this 
topic in the proper way. My apologies for the 'noise'.

Regards,
barry

>
> Among this list, the feature "Composite Datasets" [1] might partially capture what you have in mind and a future WG might possibly work out the details of such feature.
>
> We'd kindly ask you to confirm by a reply to this list that this addresses your comment.
>
> Axel Polleres, on behalf of the SPARQL WG
>
> 1. http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets
> 2. http://www.w3.org/2009/sparql/wiki/Future_Work_Items
> 3. http://www.w3.org/2009/sparql/wiki/Category:Features
Received on Wednesday, 5 September 2012 17:49:46 UTC