Re: Querying only the default graph from the data store from Barry Bishop on 2012-09-07 (public-rdf-dawg-comments@w3.org from September 2012)

From: Barry Bishop <barry.bishop@ontotext.com>
Date: Fri, 07 Sep 2012 13:44:22 +0200
To: "Polleres, Axel" <axel.polleres@siemens.com>
CC: "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>
Message-ID: <5049DE16.6050802@ontotext.com>
Hello Axel,

On 05/09/12 21:14, Polleres, Axel wrote:
> Thanks Barry,
>
> Since you confirm that the response addresses your comment, please consider this reply informal (chair-hat off).
>
>> I feel this is a shame, as two different implementations can
>> produce different output from the simplest of queries, e.g.
>> SELECT * { ?s ?p ?o }
> I personally find this quite normal... different endpoints
> respond differently to such query since they refer to different default datasets, i.e.
> Naturally when I query dbpedia.org I qury a different dataset than data.semanticweb.org, etc.

Well, dbpedia.org and data.semanticweb.org sparql endpoints make 
different data available, so I suppose you would naturally get different 
results to the same query. However, this is not what I was getting at. 
In fact, I'm not sure I have managed to get my point across at all. 
Perhaps another hypothetical example:

Suppose you run a development team that builds an application that 
interacts with some public sparql endpoint, say http://xyz.org/sparql - 
then one day xyz.org start to have scalability problems and decide to 
upgrade their RDF database to some expensive new thing. Both old and new 
RDF databases are fully compliant with W3C, but after they upgrade your 
application is completely broken only because the two database 
implementations construct their RDF dataset differently when no FROM 
clauses are given. I am sure you wouldn't find it so natural in this case.

There are some workarounds as you say, but not in all cases. When you 
are using someone else's database and don't get to decide how they 
partition their data in to separate graphs, then you can be completely 
stuck. As fabulous as the query language is (and I do think it is 
tremendous achievement), this ambiguity over constructing a dataset when 
there are no FROMs is a bit of a hole.

>
> Notably, I'd like to also point you to the another document within the SPARQL1.1 specification,
> i.e. the service-description document at
> http://www.w3.org/TR/sparql11-service-description/
> which provides means to describe which graphs compose the default
> dataset of a particular service endpoint.
> Particularly, the property
>   http://www.w3.org/TR/sparql11-service-description/#sd-defaultDataset
> is intended to provide a description of the default dataset that an endpoint uses.
> Note also that the service desription voaculary is extensible, and what we specify now is only a core, but other vocabulary can be used to extend this (e.g. VoID)

All well and good, if this feature is actually provided by an endpoint. 
However, it requires quite a lot of programming for a client to work all 
this out and re-write queries accordingly. And actually, it still 
doesn't help - e.g. if the endpoint you want to use constructs the 
dataset as an RDF merge of all graphs (when no FROM clauses are given [I 
need to find an abbreviation for this]) and you only want to query the 
default graph, then you just can't do it. There is no way to tell such 
an endpoint that you only want the default graph using the query language.

The problem is basically that the default graph is special - because it 
doesn't have an identifier it can not be used in the same way as named 
graphs....

... in the query language. However, in the update language the 
appropriate syntax has already been created and would be the perfect 
complement to the query language, e.g. if I can do this:

     CLEAR DEFAULT

why can't I do this:

     SELECT *
     FROM DEFAULT
     {...}

and specify absolutely unambiguously that I want my query to execute 
*only* over the default graph in the database. No matter how an 
implementation constructs its dataset when no FROM clauses are given, 
this syntax should always work in the expected way.

Since I am rambling on, the related keywords from the update language 
would also be very useful, e.g. one can clear all graphs like this:

     CLEAR ALL

so why not be able to do this:

     SELECT *
     FROM ALL
     {...}

This would help in the opposite case, when an implementation constructs 
the dataset using only the default graph (when no FROM clauses are 
given). In this situation, it is not possible to query for the graph 
names (using select distinct ?g {graph ?g {?s ?p ?o}}), so the above 
would say: "please merge all graphs for input to my query, even though I 
don't know what their names are and have no way of finding out (using 
the query language)".

These things might not seem important, but they are life and death to 
application programmers. Right now, to build an application that needs 
to interact with a sparql endpoint that is only known at runtime is 
fraught with difficulties. Not the least of which is that if your 
application is required to query data only from the default graph, then 
there is no way to write a query that is guaranteed to do this on all 
(W3C compliant) sparql endpoints.

Which I still feel is a bit of a shame.

barry


>
> As for the rest of your response, we seem to agree that what you're aiming at
> is rather a new feature than something this working group can address within its current
> charter and resources.
>
> Best regards,
> Axel
>
>> -----Original Message-----
>> From: Barry Bishop [mailto:barry.bishop@ontotext.com]
>> Sent: Mittwoch, 05. September 2012 19:49
>> To: Polleres, Axel
>> Cc: public-rdf-dawg-comments@w3.org
>> Subject: Re: Querying only the default graph from the data store
>>
>> Hello Axel,
>>
>> Thanks for taking the time to reply. I realise this thread is
>> somewhat out of place given the status/progress of the WG.
>>
>> Your reply does address my initial post. It does not resolve
>> it, but this is perhaps not the time. However, for the
>> purpose of clarity I will make further comments inline:
>>
>> On 05/09/12 04:11, Polleres, Axel wrote:
>>> Hi Barry,
>>>
>>> This is in response to
>>>
>> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Aug/0
>>> 011.html
>>>
>>>> The working draft does not specify how the RDF dataset is
>> constructed
>>>> when no FROM and FROM NAMED clauses are present in the
>> SPARQL query.
>>>> Implementations are therefore able to construct the dataset
>>>> differently, e.g.
>>>> a. dataset default graph contains only the data store's
>> default graph
>>>> b. dataset default graph contains the RDF merge of all
>> graphs in the
>>>> data store
>>> It is correct that how the concrete default dataset of a
>> SPARQL endpoint is conctructed is left open to
>> implementations. Since different endpoints and
>> implementations support different behaviours in this regard
>> (e.g. in some implementations the default graph of the
>> default dataset is the union of all named graphs whereas in
>> others this is not the case), the working group does not feel
>> that there is a unique standard behavior to be advocated this
>> time around.
>>
>> I feel this is a shame, as two different implementations can
>> produce different output from the simplest of queries, e.g.
>> SELECT * { ?s ?p ?o }
>>
>> However, this is a separate issue.
>>
>>>> As soon as a single FROM or FROM NAMED clause is used then
>> the data
>>>> store's default graph is excluded from the query's dataset.
>>>>
>>>> Which means that there is no portable way to defne a
>> SPARQL query so
>>>> that it executes only against the default graph in the
>> data store -
>>>> or even against a combination of the default graph and one or more
>>>> named graphs.
>>> Please note that a) querying the default graph in the
>> datastore is the standard behavior when no explicit FROM or
>> FROM NAMED clauses are given. b) the combination of querying
>> named graphs and the default graph of the endpoint's default
>> dataset is supported via GRAPH graph patterns.
>>
>> a) This is rather inconsistent. Above you say that the
>> construction of the default RDF dataset (when no FROM/FROM
>> NAMED clauses are given) is not defined, but here you say
>> constructing it using the default graph only is the 'standard
>> behaviour'. One of the motivations for this post is that
>> there are good reasons not to have only the default graph in
>> the 'default dataset', e.g. you wouldn't be able to do this
>> to find out the graph names when presented with an unknown endpoint:
>>
>> SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o } }
>>
>> Anyway, the point here is that there is no *portable* way to
>> query just the default graph.
>>
>> b) yes, but you can't query the RDF merge of the default
>> graph and a named graph in the same way with two named
>> graphs, e.g. FROM ex:g1 FROM ex:g2. Instead one would need to
>> use a triple and graph pattern union, which for complex
>> queries becomes cumbersome. Put another way, any combination
>> of named graphs can be merged and explored with query triple
>> patterns, but this can't be done with any combination of
>> named graphs and the default graph.
>>
>>
>>> See also examples below.
>>>
>>>> This is a problem that often confuses users of RDF data
>> stores and is
>>>> likely to lead to implementations that provide their own specific
>>>> means to achieve this, e.g.
>>>> http://www.openrdf.org/issues/browse/SES-850
>>>>
>>>> Inspired by the update language's use of the 'DEFAULT' keyword for
>>>> graph manipulation, I suggest an extension to the query
>> language that
>>>> allows "FROM DEFAULT" to be used, e.g.
>>>>
>>>> SELECT *
>>>> FROM DEFAULT
>>>> WHERE { ..... }
>>>>
>>>> => dataset contains a default graph made up of the data store's
>>>> default graph only
>>> Please note that this the standard behaviour when no FROM clause is
>>> given, i.e. this corresponds to
>>>
>>> SELECT *
>>> WHERE { ..... }       <--- (no use of GRAPH keyword)
>> I don't think this is "standard behaviour", rather it is
>> common behaviour. It can not be standard when the
>> construction of the dataset is implementation dependent when
>> no FROM clause is given.
>>
>>>> This construct can be used with any number of FROM <uri>
>> or FROM NAMED
>>>> <uri> clauses, e.g.
>>>>
>>>> SELECT *
>>>> FROM DEFAULT
>>>> FROM <http://example.com#g1>
>>>> WHERE { ..... }
>>>>
>>>> => dataset contains a default graph made up of the data
>> store's default
>>>> graph merged with the contents of the data store's g1 graph
>>>> This would be a fairly trivial change for exisiting sparql
>> processor
>>>> implementations, but would provide a big improvement in
>>>> functionality/flexibility by allowing a data store's
>> default graph to be
>>>> used/queried/merged in the same way as any of it's named graphs.
>>> Note that similar to the example above, you can query the
>> default graph and named graphs within the default dataset in
>> a data store side by side by using GRAPH graph patterns, i.e.
>>>    SELECT *
>>>    WHERE
>>>    {
>>>      .....                              <-- (no use of
>> GRAPH) matches the default graph
>>>      GRAPH <http://ex.com#g1> { .... }  <-- matches named
>> graph g1 (assuming g1 is a named graph in the default dataset)
>>>    }
>> Consider an application that needs to execute queries over various
>> subsets of a database's contents, where the subsets are defined using
>> various combinations of named graphs. It would certainly be useful to
>> have standard queries which only required the appropriate
>> "FROM g1 FROM
>> g2 etc" prepended. This is easy to do, unless one of the
>> graphs is the
>> default graph.
>>
>>> Finally, note that it is not possible in SPARQL1.1 to
>> construct a *new* dataset composed of *parts* of the default
>> dataset of an endpoint plus possible external graphs; such a
>> feature currently not foreseen in the features addressed in
>> this round of SPARQL, but had been suggested before [1].
>>> The features being worked on in this round of
>> standardization have been decided in a voting process at the
>> beginning of the WG and are documented in the following
>> document: http://www.w3.org/TR/sparql-features/
>>> Additionally, a list of work items and features postponed
>> to a future working group are being collected by the group in
>> a dedicated wiki page [2] which also contains the features
>> discussed in the beginning of the WG which have not been
>> considered for this round [3].
>>
>> Yes, I will be more timely next time and will endeavour to
>> progress this
>> topic in the proper way. My apologies for the 'noise'.
>>
>> Regards,
>> barry
>>
>>> Among this list, the feature "Composite Datasets" [1] might
>> partially capture what you have in mind and a future WG might
>> possibly work out the details of such feature.
>>> We'd kindly ask you to confirm by a reply to this list that
>> this addresses your comment.
>>> Axel Polleres, on behalf of the SPARQL WG
>>>
>>> 1. http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets
>>> 2. http://www.w3.org/2009/sparql/wiki/Future_Work_Items
>>> 3. http://www.w3.org/2009/sparql/wiki/Category:Features
>>
Received on Friday, 7 September 2012 11:44:52 UTC