Re: Querying only the default graph from the data store from Steve Harris on 2012-09-07 (public-sparql-dev@w3.org from July to September 2012)

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 7 Sep 2012 16:30:24 +0100
To: Lee Feigenbaum <lee@thefigtrees.net>
Cc: Barry Bishop <barry.bishop@ontotext.com>, "Polleres, Axel" <axel.polleres@siemens.com>, "public-sparql-dev@w3.org" <public-sparql-dev@w3.org>
Message-Id: <82AD8209-2CD4-45BB-83D0-EB07885F401B@garlik.com>
4store and 5store - by default the default graph is the union of the named graphs, but if you do e.g.

INSERT DATA { <s> <p> <o> }

then <s> <p> <o> will get written into a "special" named graph.

There's a flag you can use (in both I think) to get the "special" graph to behave like the unnamed graph.

- Steve

PS for the record, I still think the whole unnamed/default graph thing is silly.

On 2012-09-07, at 15:37, Lee Feigenbaum wrote:

> [moved to public-sparql-dev]
> 
> I have a related question -- do all quad stores  / named graph stores include a default graph? If the store that you develop or use does have a default graph, does that graph also have a name (URI)?
> 
> Answering for Anzo: Anzo does not have a default graph. All graphs are named with URIs.
> 
> Lee
> 
> On 9/7/2012 7:44 AM, Barry Bishop wrote:
>> Hello Axel,
>> 
>> On 05/09/12 21:14, Polleres, Axel wrote:
>>> Thanks Barry,
>>> 
>>> Since you confirm that the response addresses your comment, please consider this reply informal (chair-hat off).
>>> 
>>>> I feel this is a shame, as two different implementations can
>>>> produce different output from the simplest of queries, e.g.
>>>> SELECT * { ?s ?p ?o }
>>> I personally find this quite normal... different endpoints
>>> respond differently to such query since they refer to different default datasets, i.e.
>>> Naturally when I query dbpedia.org I qury a different dataset than data.semanticweb.org, etc.
>> 
>> Well, dbpedia.org and data.semanticweb.org sparql endpoints make different data available, so I suppose you would naturally get different results to the same query. However, this is not what I was getting at. In fact, I'm not sure I have managed to get my point across at all. Perhaps another hypothetical example:
>> 
>> Suppose you run a development team that builds an application that interacts with some public sparql endpoint, say http://xyz.org/sparql - then one day xyz.org start to have scalability problems and decide to upgrade their RDF database to some expensive new thing. Both old and new RDF databases are fully compliant with W3C, but after they upgrade your application is completely broken only because the two database implementations construct their RDF dataset differently when no FROM clauses are given. I am sure you wouldn't find it so natural in this case.
>> 
>> There are some workarounds as you say, but not in all cases. When you are using someone else's database and don't get to decide how they partition their data in to separate graphs, then you can be completely stuck. As fabulous as the query language is (and I do think it is tremendous achievement), this ambiguity over constructing a dataset when there are no FROMs is a bit of a hole.
>> 
>>> 
>>> Notably, I'd like to also point you to the another document within the SPARQL1.1 specification,
>>> i.e. the service-description document at
>>> http://www.w3.org/TR/sparql11-service-description/
>>> which provides means to describe which graphs compose the default
>>> dataset of a particular service endpoint.
>>> Particularly, the property
>>> http://www.w3.org/TR/sparql11-service-description/#sd-defaultDataset
>>> is intended to provide a description of the default dataset that an endpoint uses.
>>> Note also that the service desription voaculary is extensible, and what we specify now is only a core, but other vocabulary can be used to extend this (e.g. VoID)
>> 
>> All well and good, if this feature is actually provided by an endpoint. However, it requires quite a lot of programming for a client to work all this out and re-write queries accordingly. And actually, it still doesn't help - e.g. if the endpoint you want to use constructs the dataset as an RDF merge of all graphs (when no FROM clauses are given [I need to find an abbreviation for this]) and you only want to query the default graph, then you just can't do it. There is no way to tell such an endpoint that you only want the default graph using the query language.
>> 
>> The problem is basically that the default graph is special - because it doesn't have an identifier it can not be used in the same way as named graphs....
>> 
>> ... in the query language. However, in the update language the appropriate syntax has already been created and would be the perfect complement to the query language, e.g. if I can do this:
>> 
>>    CLEAR DEFAULT
>> 
>> why can't I do this:
>> 
>>    SELECT *
>>    FROM DEFAULT
>>    {...}
>> 
>> and specify absolutely unambiguously that I want my query to execute *only* over the default graph in the database. No matter how an implementation constructs its dataset when no FROM clauses are given, this syntax should always work in the expected way.
>> 
>> Since I am rambling on, the related keywords from the update language would also be very useful, e.g. one can clear all graphs like this:
>> 
>>    CLEAR ALL
>> 
>> so why not be able to do this:
>> 
>>    SELECT *
>>    FROM ALL
>>    {...}
>> 
>> This would help in the opposite case, when an implementation constructs the dataset using only the default graph (when no FROM clauses are given). In this situation, it is not possible to query for the graph names (using select distinct ?g {graph ?g {?s ?p ?o}}), so the above would say: "please merge all graphs for input to my query, even though I don't know what their names are and have no way of finding out (using the query language)".
>> 
>> These things might not seem important, but they are life and death to application programmers. Right now, to build an application that needs to interact with a sparql endpoint that is only known at runtime is fraught with difficulties. Not the least of which is that if your application is required to query data only from the default graph, then there is no way to write a query that is guaranteed to do this on all (W3C compliant) sparql endpoints.
>> 
>> Which I still feel is a bit of a shame.
>> 
>> barry
>> 
>> 
>>> 
>>> As for the rest of your response, we seem to agree that what you're aiming at
>>> is rather a new feature than something this working group can address within its current
>>> charter and resources.
>>> 
>>> Best regards,
>>> Axel
>>> 
>>>> -----Original Message-----
>>>> From: Barry Bishop [mailto:barry.bishop@ontotext.com]
>>>> Sent: Mittwoch, 05. September 2012 19:49
>>>> To: Polleres, Axel
>>>> Cc: public-rdf-dawg-comments@w3.org
>>>> Subject: Re: Querying only the default graph from the data store
>>>> 
>>>> Hello Axel,
>>>> 
>>>> Thanks for taking the time to reply. I realise this thread is
>>>> somewhat out of place given the status/progress of the WG.
>>>> 
>>>> Your reply does address my initial post. It does not resolve
>>>> it, but this is perhaps not the time. However, for the
>>>> purpose of clarity I will make further comments inline:
>>>> 
>>>> On 05/09/12 04:11, Polleres, Axel wrote:
>>>>> Hi Barry,
>>>>> 
>>>>> This is in response to
>>>>> 
>>>> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Aug/0
>>>>> 011.html
>>>>> 
>>>>>> The working draft does not specify how the RDF dataset is
>>>> constructed
>>>>>> when no FROM and FROM NAMED clauses are present in the
>>>> SPARQL query.
>>>>>> Implementations are therefore able to construct the dataset
>>>>>> differently, e.g.
>>>>>> a. dataset default graph contains only the data store's
>>>> default graph
>>>>>> b. dataset default graph contains the RDF merge of all
>>>> graphs in the
>>>>>> data store
>>>>> It is correct that how the concrete default dataset of a
>>>> SPARQL endpoint is conctructed is left open to
>>>> implementations. Since different endpoints and
>>>> implementations support different behaviours in this regard
>>>> (e.g. in some implementations the default graph of the
>>>> default dataset is the union of all named graphs whereas in
>>>> others this is not the case), the working group does not feel
>>>> that there is a unique standard behavior to be advocated this
>>>> time around.
>>>> 
>>>> I feel this is a shame, as two different implementations can
>>>> produce different output from the simplest of queries, e.g.
>>>> SELECT * { ?s ?p ?o }
>>>> 
>>>> However, this is a separate issue.
>>>> 
>>>>>> As soon as a single FROM or FROM NAMED clause is used then
>>>> the data
>>>>>> store's default graph is excluded from the query's dataset.
>>>>>> 
>>>>>> Which means that there is no portable way to defne a
>>>> SPARQL query so
>>>>>> that it executes only against the default graph in the
>>>> data store -
>>>>>> or even against a combination of the default graph and one or more
>>>>>> named graphs.
>>>>> Please note that a) querying the default graph in the
>>>> datastore is the standard behavior when no explicit FROM or
>>>> FROM NAMED clauses are given. b) the combination of querying
>>>> named graphs and the default graph of the endpoint's default
>>>> dataset is supported via GRAPH graph patterns.
>>>> 
>>>> a) This is rather inconsistent. Above you say that the
>>>> construction of the default RDF dataset (when no FROM/FROM
>>>> NAMED clauses are given) is not defined, but here you say
>>>> constructing it using the default graph only is the 'standard
>>>> behaviour'. One of the motivations for this post is that
>>>> there are good reasons not to have only the default graph in
>>>> the 'default dataset', e.g. you wouldn't be able to do this
>>>> to find out the graph names when presented with an unknown endpoint:
>>>> 
>>>> SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o } }
>>>> 
>>>> Anyway, the point here is that there is no *portable* way to
>>>> query just the default graph.
>>>> 
>>>> b) yes, but you can't query the RDF merge of the default
>>>> graph and a named graph in the same way with two named
>>>> graphs, e.g. FROM ex:g1 FROM ex:g2. Instead one would need to
>>>> use a triple and graph pattern union, which for complex
>>>> queries becomes cumbersome. Put another way, any combination
>>>> of named graphs can be merged and explored with query triple
>>>> patterns, but this can't be done with any combination of
>>>> named graphs and the default graph.
>>>> 
>>>> 
>>>>> See also examples below.
>>>>> 
>>>>>> This is a problem that often confuses users of RDF data
>>>> stores and is
>>>>>> likely to lead to implementations that provide their own specific
>>>>>> means to achieve this, e.g.
>>>>>> http://www.openrdf.org/issues/browse/SES-850
>>>>>> 
>>>>>> Inspired by the update language's use of the 'DEFAULT' keyword for
>>>>>> graph manipulation, I suggest an extension to the query
>>>> language that
>>>>>> allows "FROM DEFAULT" to be used, e.g.
>>>>>> 
>>>>>> SELECT *
>>>>>> FROM DEFAULT
>>>>>> WHERE { ..... }
>>>>>> 
>>>>>> => dataset contains a default graph made up of the data store's
>>>>>> default graph only
>>>>> Please note that this the standard behaviour when no FROM clause is
>>>>> given, i.e. this corresponds to
>>>>> 
>>>>> SELECT *
>>>>> WHERE { ..... }       <--- (no use of GRAPH keyword)
>>>> I don't think this is "standard behaviour", rather it is
>>>> common behaviour. It can not be standard when the
>>>> construction of the dataset is implementation dependent when
>>>> no FROM clause is given.
>>>> 
>>>>>> This construct can be used with any number of FROM <uri>
>>>> or FROM NAMED
>>>>>> <uri> clauses, e.g.
>>>>>> 
>>>>>> SELECT *
>>>>>> FROM DEFAULT
>>>>>> FROM <http://example.com#g1>
>>>>>> WHERE { ..... }
>>>>>> 
>>>>>> => dataset contains a default graph made up of the data
>>>> store's default
>>>>>> graph merged with the contents of the data store's g1 graph
>>>>>> This would be a fairly trivial change for exisiting sparql
>>>> processor
>>>>>> implementations, but would provide a big improvement in
>>>>>> functionality/flexibility by allowing a data store's
>>>> default graph to be
>>>>>> used/queried/merged in the same way as any of it's named graphs.
>>>>> Note that similar to the example above, you can query the
>>>> default graph and named graphs within the default dataset in
>>>> a data store side by side by using GRAPH graph patterns, i.e.
>>>>>   SELECT *
>>>>>   WHERE
>>>>>   {
>>>>>     .....                              <-- (no use of
>>>> GRAPH) matches the default graph
>>>>>     GRAPH <http://ex.com#g1> { .... }  <-- matches named
>>>> graph g1 (assuming g1 is a named graph in the default dataset)
>>>>>   }
>>>> Consider an application that needs to execute queries over various
>>>> subsets of a database's contents, where the subsets are defined using
>>>> various combinations of named graphs. It would certainly be useful to
>>>> have standard queries which only required the appropriate
>>>> "FROM g1 FROM
>>>> g2 etc" prepended. This is easy to do, unless one of the
>>>> graphs is the
>>>> default graph.
>>>> 
>>>>> Finally, note that it is not possible in SPARQL1.1 to
>>>> construct a *new* dataset composed of *parts* of the default
>>>> dataset of an endpoint plus possible external graphs; such a
>>>> feature currently not foreseen in the features addressed in
>>>> this round of SPARQL, but had been suggested before [1].
>>>>> The features being worked on in this round of
>>>> standardization have been decided in a voting process at the
>>>> beginning of the WG and are documented in the following
>>>> document: http://www.w3.org/TR/sparql-features/
>>>>> Additionally, a list of work items and features postponed
>>>> to a future working group are being collected by the group in
>>>> a dedicated wiki page [2] which also contains the features
>>>> discussed in the beginning of the WG which have not been
>>>> considered for this round [3].
>>>> 
>>>> Yes, I will be more timely next time and will endeavour to
>>>> progress this
>>>> topic in the proper way. My apologies for the 'noise'.
>>>> 
>>>> Regards,
>>>> barry
>>>> 
>>>>> Among this list, the feature "Composite Datasets" [1] might
>>>> partially capture what you have in mind and a future WG might
>>>> possibly work out the details of such feature.
>>>>> We'd kindly ask you to confirm by a reply to this list that
>>>> this addresses your comment.
>>>>> Axel Polleres, on behalf of the SPARQL WG
>>>>> 
>>>>> 1. http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets
>>>>> 2. http://www.w3.org/2009/sparql/wiki/Future_Work_Items
>>>>> 3. http://www.w3.org/2009/sparql/wiki/Category:Features
>>>> 
>> 
>> 
>> 
> 
> 

-- 
Steve Harris, CTO
Garlik, a part of Experian
+44 7854 417 874  http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
80 Victoria Street, London, SW1E 5JL
Received on Friday, 7 September 2012 15:31:00 UTC