Re: Querying only the default graph from the data store from Andy Seaborne on 2012-09-09 (public-sparql-dev@w3.org from July to September 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Sun, 09 Sep 2012 12:08:02 +0100
To: public-sparql-dev@w3.org
Message-ID: <504C7892.1060600@epimorphics.com>
On 07/09/12 15:37, Lee Feigenbaum wrote:
> [moved to public-sparql-dev]
>
> I have a related question -- do all quad stores  / named graph stores
> include a default graph? If the store that you develop or use does have
> a default graph, does that graph also have a name (URI)?

TDB has a real, stored default graph or can operate in union-named-graph 
mode.

The storage default graph can have be accessed by a URI name but:

1/ The name is the same in all stores so it is not a well-formed
2/ It does not show up in GRAPH ?g {}

The union of all named graphs also has a name so a query accessing the 
usual default graph as normal can also GRAPH to get the union.

But:

The most common use case is storing and querying a single graph, named 
graphs just confuse the issue a lot of the time :-)

	Andy


>
> Answering for Anzo: Anzo does not have a default graph. All graphs are
> named with URIs.
>
> Lee
>
> On 9/7/2012 7:44 AM, Barry Bishop wrote:
>> Hello Axel,
>>
>> On 05/09/12 21:14, Polleres, Axel wrote:
>>> Thanks Barry,
>>>
>>> Since you confirm that the response addresses your comment, please
>>> consider this reply informal (chair-hat off).
>>>
>>>> I feel this is a shame, as two different implementations can
>>>> produce different output from the simplest of queries, e.g.
>>>> SELECT * { ?s ?p ?o }
>>> I personally find this quite normal... different endpoints
>>> respond differently to such query since they refer to different
>>> default datasets, i.e.
>>> Naturally when I query dbpedia.org I qury a different dataset than
>>> data.semanticweb.org, etc.
>>
>> Well, dbpedia.org and data.semanticweb.org sparql endpoints make
>> different data available, so I suppose you would naturally get
>> different results to the same query. However, this is not what I was
>> getting at. In fact, I'm not sure I have managed to get my point
>> across at all. Perhaps another hypothetical example:
>>
>> Suppose you run a development team that builds an application that
>> interacts with some public sparql endpoint, say http://xyz.org/sparql
>> - then one day xyz.org start to have scalability problems and decide
>> to upgrade their RDF database to some expensive new thing. Both old
>> and new RDF databases are fully compliant with W3C, but after they
>> upgrade your application is completely broken only because the two
>> database implementations construct their RDF dataset differently when
>> no FROM clauses are given. I am sure you wouldn't find it so natural
>> in this case.
>>
>> There are some workarounds as you say, but not in all cases. When you
>> are using someone else's database and don't get to decide how they
>> partition their data in to separate graphs, then you can be completely
>> stuck. As fabulous as the query language is (and I do think it is
>> tremendous achievement), this ambiguity over constructing a dataset
>> when there are no FROMs is a bit of a hole.
>>
>>>
>>> Notably, I'd like to also point you to the another document within
>>> the SPARQL1.1 specification,
>>> i.e. the service-description document at
>>> http://www.w3.org/TR/sparql11-service-description/
>>> which provides means to describe which graphs compose the default
>>> dataset of a particular service endpoint.
>>> Particularly, the property
>>> http://www.w3.org/TR/sparql11-service-description/#sd-defaultDataset
>>> is intended to provide a description of the default dataset that an
>>> endpoint uses.
>>> Note also that the service desription voaculary is extensible, and
>>> what we specify now is only a core, but other vocabulary can be used
>>> to extend this (e.g. VoID)
>>
>> All well and good, if this feature is actually provided by an
>> endpoint. However, it requires quite a lot of programming for a client
>> to work all this out and re-write queries accordingly. And actually,
>> it still doesn't help - e.g. if the endpoint you want to use
>> constructs the dataset as an RDF merge of all graphs (when no FROM
>> clauses are given [I need to find an abbreviation for this]) and you
>> only want to query the default graph, then you just can't do it. There
>> is no way to tell such an endpoint that you only want the default
>> graph using the query language.
>>
>> The problem is basically that the default graph is special - because
>> it doesn't have an identifier it can not be used in the same way as
>> named graphs....
>>
>> ... in the query language. However, in the update language the
>> appropriate syntax has already been created and would be the perfect
>> complement to the query language, e.g. if I can do this:
>>
>>     CLEAR DEFAULT
>>
>> why can't I do this:
>>
>>     SELECT *
>>     FROM DEFAULT
>>     {...}
>>
>> and specify absolutely unambiguously that I want my query to execute
>> *only* over the default graph in the database. No matter how an
>> implementation constructs its dataset when no FROM clauses are given,
>> this syntax should always work in the expected way.
>>
>> Since I am rambling on, the related keywords from the update language
>> would also be very useful, e.g. one can clear all graphs like this:
>>
>>     CLEAR ALL
>>
>> so why not be able to do this:
>>
>>     SELECT *
>>     FROM ALL
>>     {...}
>>
>> This would help in the opposite case, when an implementation
>> constructs the dataset using only the default graph (when no FROM
>> clauses are given). In this situation, it is not possible to query for
>> the graph names (using select distinct ?g {graph ?g {?s ?p ?o}}), so
>> the above would say: "please merge all graphs for input to my query,
>> even though I don't know what their names are and have no way of
>> finding out (using the query language)".
>>
>> These things might not seem important, but they are life and death to
>> application programmers. Right now, to build an application that needs
>> to interact with a sparql endpoint that is only known at runtime is
>> fraught with difficulties. Not the least of which is that if your
>> application is required to query data only from the default graph,
>> then there is no way to write a query that is guaranteed to do this on
>> all (W3C compliant) sparql endpoints.
>>
>> Which I still feel is a bit of a shame.
>>
>> barry
>>
>>
>>>
>>> As for the rest of your response, we seem to agree that what you're
>>> aiming at
>>> is rather a new feature than something this working group can address
>>> within its current
>>> charter and resources.
>>>
>>> Best regards,
>>> Axel
>>>
>>>> -----Original Message-----
>>>> From: Barry Bishop [mailto:barry.bishop@ontotext.com]
>>>> Sent: Mittwoch, 05. September 2012 19:49
>>>> To: Polleres, Axel
>>>> Cc: public-rdf-dawg-comments@w3.org
>>>> Subject: Re: Querying only the default graph from the data store
>>>>
>>>> Hello Axel,
>>>>
>>>> Thanks for taking the time to reply. I realise this thread is
>>>> somewhat out of place given the status/progress of the WG.
>>>>
>>>> Your reply does address my initial post. It does not resolve
>>>> it, but this is perhaps not the time. However, for the
>>>> purpose of clarity I will make further comments inline:
>>>>
>>>> On 05/09/12 04:11, Polleres, Axel wrote:
>>>>> Hi Barry,
>>>>>
>>>>> This is in response to
>>>>>
>>>> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Aug/0
>>>>> 011.html
>>>>>
>>>>>> The working draft does not specify how the RDF dataset is
>>>> constructed
>>>>>> when no FROM and FROM NAMED clauses are present in the
>>>> SPARQL query.
>>>>>> Implementations are therefore able to construct the dataset
>>>>>> differently, e.g.
>>>>>> a. dataset default graph contains only the data store's
>>>> default graph
>>>>>> b. dataset default graph contains the RDF merge of all
>>>> graphs in the
>>>>>> data store
>>>>> It is correct that how the concrete default dataset of a
>>>> SPARQL endpoint is conctructed is left open to
>>>> implementations. Since different endpoints and
>>>> implementations support different behaviours in this regard
>>>> (e.g. in some implementations the default graph of the
>>>> default dataset is the union of all named graphs whereas in
>>>> others this is not the case), the working group does not feel
>>>> that there is a unique standard behavior to be advocated this
>>>> time around.
>>>>
>>>> I feel this is a shame, as two different implementations can
>>>> produce different output from the simplest of queries, e.g.
>>>> SELECT * { ?s ?p ?o }
>>>>
>>>> However, this is a separate issue.
>>>>
>>>>>> As soon as a single FROM or FROM NAMED clause is used then
>>>> the data
>>>>>> store's default graph is excluded from the query's dataset.
>>>>>>
>>>>>> Which means that there is no portable way to defne a
>>>> SPARQL query so
>>>>>> that it executes only against the default graph in the
>>>> data store -
>>>>>> or even against a combination of the default graph and one or more
>>>>>> named graphs.
>>>>> Please note that a) querying the default graph in the
>>>> datastore is the standard behavior when no explicit FROM or
>>>> FROM NAMED clauses are given. b) the combination of querying
>>>> named graphs and the default graph of the endpoint's default
>>>> dataset is supported via GRAPH graph patterns.
>>>>
>>>> a) This is rather inconsistent. Above you say that the
>>>> construction of the default RDF dataset (when no FROM/FROM
>>>> NAMED clauses are given) is not defined, but here you say
>>>> constructing it using the default graph only is the 'standard
>>>> behaviour'. One of the motivations for this post is that
>>>> there are good reasons not to have only the default graph in
>>>> the 'default dataset', e.g. you wouldn't be able to do this
>>>> to find out the graph names when presented with an unknown endpoint:
>>>>
>>>> SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o } }
>>>>
>>>> Anyway, the point here is that there is no *portable* way to
>>>> query just the default graph.
>>>>
>>>> b) yes, but you can't query the RDF merge of the default
>>>> graph and a named graph in the same way with two named
>>>> graphs, e.g. FROM ex:g1 FROM ex:g2. Instead one would need to
>>>> use a triple and graph pattern union, which for complex
>>>> queries becomes cumbersome. Put another way, any combination
>>>> of named graphs can be merged and explored with query triple
>>>> patterns, but this can't be done with any combination of
>>>> named graphs and the default graph.
>>>>
>>>>
>>>>> See also examples below.
>>>>>
>>>>>> This is a problem that often confuses users of RDF data
>>>> stores and is
>>>>>> likely to lead to implementations that provide their own specific
>>>>>> means to achieve this, e.g.
>>>>>> http://www.openrdf.org/issues/browse/SES-850
>>>>>>
>>>>>> Inspired by the update language's use of the 'DEFAULT' keyword for
>>>>>> graph manipulation, I suggest an extension to the query
>>>> language that
>>>>>> allows "FROM DEFAULT" to be used, e.g.
>>>>>>
>>>>>> SELECT *
>>>>>> FROM DEFAULT
>>>>>> WHERE { ..... }
>>>>>>
>>>>>> => dataset contains a default graph made up of the data store's
>>>>>> default graph only
>>>>> Please note that this the standard behaviour when no FROM clause is
>>>>> given, i.e. this corresponds to
>>>>>
>>>>> SELECT *
>>>>> WHERE { ..... }       <--- (no use of GRAPH keyword)
>>>> I don't think this is "standard behaviour", rather it is
>>>> common behaviour. It can not be standard when the
>>>> construction of the dataset is implementation dependent when
>>>> no FROM clause is given.
>>>>
>>>>>> This construct can be used with any number of FROM <uri>
>>>> or FROM NAMED
>>>>>> <uri> clauses, e.g.
>>>>>>
>>>>>> SELECT *
>>>>>> FROM DEFAULT
>>>>>> FROM <http://example.com#g1>
>>>>>> WHERE { ..... }
>>>>>>
>>>>>> => dataset contains a default graph made up of the data
>>>> store's default
>>>>>> graph merged with the contents of the data store's g1 graph
>>>>>> This would be a fairly trivial change for exisiting sparql
>>>> processor
>>>>>> implementations, but would provide a big improvement in
>>>>>> functionality/flexibility by allowing a data store's
>>>> default graph to be
>>>>>> used/queried/merged in the same way as any of it's named graphs.
>>>>> Note that similar to the example above, you can query the
>>>> default graph and named graphs within the default dataset in
>>>> a data store side by side by using GRAPH graph patterns, i.e.
>>>>>    SELECT *
>>>>>    WHERE
>>>>>    {
>>>>>      .....                              <-- (no use of
>>>> GRAPH) matches the default graph
>>>>>      GRAPH <http://ex.com#g1> { .... }  <-- matches named
>>>> graph g1 (assuming g1 is a named graph in the default dataset)
>>>>>    }
>>>> Consider an application that needs to execute queries over various
>>>> subsets of a database's contents, where the subsets are defined using
>>>> various combinations of named graphs. It would certainly be useful to
>>>> have standard queries which only required the appropriate
>>>> "FROM g1 FROM
>>>> g2 etc" prepended. This is easy to do, unless one of the
>>>> graphs is the
>>>> default graph.
>>>>
>>>>> Finally, note that it is not possible in SPARQL1.1 to
>>>> construct a *new* dataset composed of *parts* of the default
>>>> dataset of an endpoint plus possible external graphs; such a
>>>> feature currently not foreseen in the features addressed in
>>>> this round of SPARQL, but had been suggested before [1].
>>>>> The features being worked on in this round of
>>>> standardization have been decided in a voting process at the
>>>> beginning of the WG and are documented in the following
>>>> document: http://www.w3.org/TR/sparql-features/
>>>>> Additionally, a list of work items and features postponed
>>>> to a future working group are being collected by the group in
>>>> a dedicated wiki page [2] which also contains the features
>>>> discussed in the beginning of the WG which have not been
>>>> considered for this round [3].
>>>>
>>>> Yes, I will be more timely next time and will endeavour to
>>>> progress this
>>>> topic in the proper way. My apologies for the 'noise'.
>>>>
>>>> Regards,
>>>> barry
>>>>
>>>>> Among this list, the feature "Composite Datasets" [1] might
>>>> partially capture what you have in mind and a future WG might
>>>> possibly work out the details of such feature.
>>>>> We'd kindly ask you to confirm by a reply to this list that
>>>> this addresses your comment.
>>>>> Axel Polleres, on behalf of the SPARQL WG
>>>>>
>>>>> 1. http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets
>>>>> 2. http://www.w3.org/2009/sparql/wiki/Future_Work_Items
>>>>> 3. http://www.w3.org/2009/sparql/wiki/Category:Features
>>>>
>>
>>
>>
>
>
Received on Sunday, 9 September 2012 11:08:33 UTC