- From: Barry Bishop <barry.bishop@ontotext.com>
- Date: Fri, 07 Sep 2012 16:30:06 +0200
- To: "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>
Dear WG, Please note that I do not expect a formal reply to my comments, but would welcome the opportunity to continue discussions in some future incarnation of the WG. Regards, barry On 07/09/12 13:44, Barry Bishop wrote: > Hello Axel, > > On 05/09/12 21:14, Polleres, Axel wrote: >> Thanks Barry, >> >> Since you confirm that the response addresses your comment, please >> consider this reply informal (chair-hat off). >> >>> I feel this is a shame, as two different implementations can >>> produce different output from the simplest of queries, e.g. >>> SELECT * { ?s ?p ?o } >> I personally find this quite normal... different endpoints >> respond differently to such query since they refer to different >> default datasets, i.e. >> Naturally when I query dbpedia.org I qury a different dataset than >> data.semanticweb.org, etc. > > Well, dbpedia.org and data.semanticweb.org sparql endpoints make > different data available, so I suppose you would naturally get > different results to the same query. However, this is not what I was > getting at. In fact, I'm not sure I have managed to get my point > across at all. Perhaps another hypothetical example: > > Suppose you run a development team that builds an application that > interacts with some public sparql endpoint, say http://xyz.org/sparql > - then one day xyz.org start to have scalability problems and decide > to upgrade their RDF database to some expensive new thing. Both old > and new RDF databases are fully compliant with W3C, but after they > upgrade your application is completely broken only because the two > database implementations construct their RDF dataset differently when > no FROM clauses are given. I am sure you wouldn't find it so natural > in this case. > > There are some workarounds as you say, but not in all cases. When you > are using someone else's database and don't get to decide how they > partition their data in to separate graphs, then you can be completely > stuck. As fabulous as the query language is (and I do think it is > tremendous achievement), this ambiguity over constructing a dataset > when there are no FROMs is a bit of a hole. > >> >> Notably, I'd like to also point you to the another document within >> the SPARQL1.1 specification, >> i.e. the service-description document at >> http://www.w3.org/TR/sparql11-service-description/ >> which provides means to describe which graphs compose the default >> dataset of a particular service endpoint. >> Particularly, the property >> http://www.w3.org/TR/sparql11-service-description/#sd-defaultDataset >> is intended to provide a description of the default dataset that an >> endpoint uses. >> Note also that the service desription voaculary is extensible, and >> what we specify now is only a core, but other vocabulary can be used >> to extend this (e.g. VoID) > > All well and good, if this feature is actually provided by an > endpoint. However, it requires quite a lot of programming for a client > to work all this out and re-write queries accordingly. And actually, > it still doesn't help - e.g. if the endpoint you want to use > constructs the dataset as an RDF merge of all graphs (when no FROM > clauses are given [I need to find an abbreviation for this]) and you > only want to query the default graph, then you just can't do it. There > is no way to tell such an endpoint that you only want the default > graph using the query language. > > The problem is basically that the default graph is special - because > it doesn't have an identifier it can not be used in the same way as > named graphs.... > > ... in the query language. However, in the update language the > appropriate syntax has already been created and would be the perfect > complement to the query language, e.g. if I can do this: > > CLEAR DEFAULT > > why can't I do this: > > SELECT * > FROM DEFAULT > {...} > > and specify absolutely unambiguously that I want my query to execute > *only* over the default graph in the database. No matter how an > implementation constructs its dataset when no FROM clauses are given, > this syntax should always work in the expected way. > > Since I am rambling on, the related keywords from the update language > would also be very useful, e.g. one can clear all graphs like this: > > CLEAR ALL > > so why not be able to do this: > > SELECT * > FROM ALL > {...} > > This would help in the opposite case, when an implementation > constructs the dataset using only the default graph (when no FROM > clauses are given). In this situation, it is not possible to query for > the graph names (using select distinct ?g {graph ?g {?s ?p ?o}}), so > the above would say: "please merge all graphs for input to my query, > even though I don't know what their names are and have no way of > finding out (using the query language)". > > These things might not seem important, but they are life and death to > application programmers. Right now, to build an application that needs > to interact with a sparql endpoint that is only known at runtime is > fraught with difficulties. Not the least of which is that if your > application is required to query data only from the default graph, > then there is no way to write a query that is guaranteed to do this on > all (W3C compliant) sparql endpoints. > > Which I still feel is a bit of a shame. > > barry > > >> >> As for the rest of your response, we seem to agree that what you're >> aiming at >> is rather a new feature than something this working group can address >> within its current >> charter and resources. >> >> Best regards, >> Axel >> >>> -----Original Message----- >>> From: Barry Bishop [mailto:barry.bishop@ontotext.com] >>> Sent: Mittwoch, 05. September 2012 19:49 >>> To: Polleres, Axel >>> Cc: public-rdf-dawg-comments@w3.org >>> Subject: Re: Querying only the default graph from the data store >>> >>> Hello Axel, >>> >>> Thanks for taking the time to reply. I realise this thread is >>> somewhat out of place given the status/progress of the WG. >>> >>> Your reply does address my initial post. It does not resolve >>> it, but this is perhaps not the time. However, for the >>> purpose of clarity I will make further comments inline: >>> >>> On 05/09/12 04:11, Polleres, Axel wrote: >>>> Hi Barry, >>>> >>>> This is in response to >>>> >>> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Aug/0 >>>> 011.html >>>> >>>>> The working draft does not specify how the RDF dataset is >>> constructed >>>>> when no FROM and FROM NAMED clauses are present in the >>> SPARQL query. >>>>> Implementations are therefore able to construct the dataset >>>>> differently, e.g. >>>>> a. dataset default graph contains only the data store's >>> default graph >>>>> b. dataset default graph contains the RDF merge of all >>> graphs in the >>>>> data store >>>> It is correct that how the concrete default dataset of a >>> SPARQL endpoint is conctructed is left open to >>> implementations. Since different endpoints and >>> implementations support different behaviours in this regard >>> (e.g. in some implementations the default graph of the >>> default dataset is the union of all named graphs whereas in >>> others this is not the case), the working group does not feel >>> that there is a unique standard behavior to be advocated this >>> time around. >>> >>> I feel this is a shame, as two different implementations can >>> produce different output from the simplest of queries, e.g. >>> SELECT * { ?s ?p ?o } >>> >>> However, this is a separate issue. >>> >>>>> As soon as a single FROM or FROM NAMED clause is used then >>> the data >>>>> store's default graph is excluded from the query's dataset. >>>>> >>>>> Which means that there is no portable way to defne a >>> SPARQL query so >>>>> that it executes only against the default graph in the >>> data store - >>>>> or even against a combination of the default graph and one or more >>>>> named graphs. >>>> Please note that a) querying the default graph in the >>> datastore is the standard behavior when no explicit FROM or >>> FROM NAMED clauses are given. b) the combination of querying >>> named graphs and the default graph of the endpoint's default >>> dataset is supported via GRAPH graph patterns. >>> >>> a) This is rather inconsistent. Above you say that the >>> construction of the default RDF dataset (when no FROM/FROM >>> NAMED clauses are given) is not defined, but here you say >>> constructing it using the default graph only is the 'standard >>> behaviour'. One of the motivations for this post is that >>> there are good reasons not to have only the default graph in >>> the 'default dataset', e.g. you wouldn't be able to do this >>> to find out the graph names when presented with an unknown endpoint: >>> >>> SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o } } >>> >>> Anyway, the point here is that there is no *portable* way to >>> query just the default graph. >>> >>> b) yes, but you can't query the RDF merge of the default >>> graph and a named graph in the same way with two named >>> graphs, e.g. FROM ex:g1 FROM ex:g2. Instead one would need to >>> use a triple and graph pattern union, which for complex >>> queries becomes cumbersome. Put another way, any combination >>> of named graphs can be merged and explored with query triple >>> patterns, but this can't be done with any combination of >>> named graphs and the default graph. >>> >>> >>>> See also examples below. >>>> >>>>> This is a problem that often confuses users of RDF data >>> stores and is >>>>> likely to lead to implementations that provide their own specific >>>>> means to achieve this, e.g. >>>>> http://www.openrdf.org/issues/browse/SES-850 >>>>> >>>>> Inspired by the update language's use of the 'DEFAULT' keyword for >>>>> graph manipulation, I suggest an extension to the query >>> language that >>>>> allows "FROM DEFAULT" to be used, e.g. >>>>> >>>>> SELECT * >>>>> FROM DEFAULT >>>>> WHERE { ..... } >>>>> >>>>> => dataset contains a default graph made up of the data store's >>>>> default graph only >>>> Please note that this the standard behaviour when no FROM clause is >>>> given, i.e. this corresponds to >>>> >>>> SELECT * >>>> WHERE { ..... } <--- (no use of GRAPH keyword) >>> I don't think this is "standard behaviour", rather it is >>> common behaviour. It can not be standard when the >>> construction of the dataset is implementation dependent when >>> no FROM clause is given. >>> >>>>> This construct can be used with any number of FROM <uri> >>> or FROM NAMED >>>>> <uri> clauses, e.g. >>>>> >>>>> SELECT * >>>>> FROM DEFAULT >>>>> FROM <http://example.com#g1> >>>>> WHERE { ..... } >>>>> >>>>> => dataset contains a default graph made up of the data >>> store's default >>>>> graph merged with the contents of the data store's g1 graph >>>>> This would be a fairly trivial change for exisiting sparql >>> processor >>>>> implementations, but would provide a big improvement in >>>>> functionality/flexibility by allowing a data store's >>> default graph to be >>>>> used/queried/merged in the same way as any of it's named graphs. >>>> Note that similar to the example above, you can query the >>> default graph and named graphs within the default dataset in >>> a data store side by side by using GRAPH graph patterns, i.e. >>>> SELECT * >>>> WHERE >>>> { >>>> ..... <-- (no use of >>> GRAPH) matches the default graph >>>> GRAPH <http://ex.com#g1> { .... } <-- matches named >>> graph g1 (assuming g1 is a named graph in the default dataset) >>>> } >>> Consider an application that needs to execute queries over various >>> subsets of a database's contents, where the subsets are defined using >>> various combinations of named graphs. It would certainly be useful to >>> have standard queries which only required the appropriate >>> "FROM g1 FROM >>> g2 etc" prepended. This is easy to do, unless one of the >>> graphs is the >>> default graph. >>> >>>> Finally, note that it is not possible in SPARQL1.1 to >>> construct a *new* dataset composed of *parts* of the default >>> dataset of an endpoint plus possible external graphs; such a >>> feature currently not foreseen in the features addressed in >>> this round of SPARQL, but had been suggested before [1]. >>>> The features being worked on in this round of >>> standardization have been decided in a voting process at the >>> beginning of the WG and are documented in the following >>> document: http://www.w3.org/TR/sparql-features/ >>>> Additionally, a list of work items and features postponed >>> to a future working group are being collected by the group in >>> a dedicated wiki page [2] which also contains the features >>> discussed in the beginning of the WG which have not been >>> considered for this round [3]. >>> >>> Yes, I will be more timely next time and will endeavour to >>> progress this >>> topic in the proper way. My apologies for the 'noise'. >>> >>> Regards, >>> barry >>> >>>> Among this list, the feature "Composite Datasets" [1] might >>> partially capture what you have in mind and a future WG might >>> possibly work out the details of such feature. >>>> We'd kindly ask you to confirm by a reply to this list that >>> this addresses your comment. >>>> Axel Polleres, on behalf of the SPARQL WG >>>> >>>> 1. http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets >>>> 2. http://www.w3.org/2009/sparql/wiki/Future_Work_Items >>>> 3. http://www.w3.org/2009/sparql/wiki/Category:Features >>> > >
Received on Friday, 7 September 2012 14:30:36 UTC