- From: David Booth <david@dbooth.org>
- Date: Wed, 01 Aug 2012 10:03:43 -0400
- To: "Polleres, Axel" <axel.polleres@siemens.com>
- Cc: "gearon@ieee.org" <gearon@ieee.org>, "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>
Yes, that is the same idea under a different name. I am satisfied with this resolution. Thanks! David On Wed, 2012-08-01 at 08:25 +0200, Polleres, Axel wrote: > Dear David, > > > Thank you. I am satisfied with this resolution, providing > > that "virtual graphs" is added to the wish list for > > consideration in the next version of SPARQL: > > http://www.w3.org/2009/sparql/wiki/Future_Work_Items > > In order to not add to many new items to this list, please > note that this page already has an explicit link to all the discussed > features in the beginning of this WG and that were not adopted for > work by the SPARQL 1.1 Working Group: > http://www.w3.org/2009/sparql/wiki/Category:Features > > It seems to me that this your proposal of "vitrual graphs" is covered > in the sense that > it is a variant of what we had alredy noted under the feature name > "Composite Datasets": > http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets > > Please let us know whether this addresses you concern. > > Thanks, > Axel > > > -- > Dr. Axel Polleres > Siemens AG Österreich > Corporate Technology Central Eastern Europe Research & Technologies > CT T CEE > > Tel.: +43 (0) 51707-36983 > Mobile: +43 (0) 664 88550859 > Fax: +43 (0) 51707-56682 mailto:axel.polleres@siemens.com > > > > -----Original Message----- > > From: David Booth [mailto:david@dbooth.org] > > Sent: Tuesday, 31 July 2012 5:54 PM > > To: Paul Gearon > > Cc: public-rdf-dawg-comments > > Subject: Re: SPARQL 1.1 Update - Comments > > > > On Tue, 2012-07-31 at 10:25 -0400, Paul Gearon wrote: > > > Hello David, > > > > > > Thank you for your comments. I apologise that this response > > has been > > > so long delayed. Please be assured that your comments were > > addressed > > > in the SPARQL Update document some time ago, though this formal > > > response was stuck in the queue until now. > > > > > > We have addressed your concerns below. > > > > > > On Fri, Jul 29, 2011 at 2:54 PM, David Booth > > <david@dbooth.org> wrote: > > > > Regarding > > > > http://www.w3.org/TR/2011/WD-sparql11-update-20110512/ > > > > It's great to see these documents in Last Call! > > > > > > > > Comments: > > > > > > > > 1. Please either add capability for virtual graphs or > > keep the COPY, > > > > ADD and MOVE shortcuts, to enable standard SPARQL to be used more > > > > efficiently as a rules language and in data production > > pipelines. > > > > COPY, ADD and MOVE operations cost almost nothing to > > implement, and > > > > they help with efficiency. By "virtual graph" I mean a > > graph that > > > > consists of the merge of a particular set of named graphs > > -- a very > > > > important capability for efficient data production pipelines. > > > > > > The features of COPY, ADD and MOVE were considered "At > > Risk" until the > > > working group was confident that they could be implemented without > > > undue difficulty. Now that we have some reports of successful > > > implementation, the "At Risk" designation has been removed. > > > > > > The group feels that adding a feature like "virtual graphs" at this > > > late stage of publication is not possible. > > > > Thank you. I am satisfied with this resolution, providing > > that "virtual graphs" is added to the wish list for > > consideration in the next version of SPARQL: > > http://www.w3.org/2009/sparql/wiki/Future_Work_Items > > > > > > > > > > > > > > 2. This paragraph in sec 3.1.3 is a bit confusing: > > > > [[ > > > > That is, the GroupGraphPattern in the WHERE clause will > > be matched > > > > against the dataset described by explicit USING or USING NAMED > > > > clauses, if specified, and against the graph store otherwise. Any > > > > graph name specified in a WITH clause will - for evaluating the > > > > WHERE clause - refer to the default graph to be used in > > the absence > > > > of USING or USING NAMED clauses. In the presence of one or more > > > > graphs referred to in USING clauses, the default graph > > will be the > > > > merge of these graphs, meaning that the graph in a WITH > > clause will > > > > be ignored while evaluating the WHERE clause. If there is > > no USING > > > > clause, but there is one or more USING NAMED clauses, then the > > > > dataset will include an empty graph for the default graph. > > > > ]] > > > > In particular, the sentence "Any graph name specified in a WITH > > > > clause will - for evaluating the WHERE clause - refer to > > the default > > > > graph to be used in the absence of USING or USING NAMED clauses." > > > > seems odd. The graph specified in the WITH clause will > > refer to the > > > > *default* graph? I would think it would be used *instead* of the > > > > default graph. Isn't that the point of WITH? Perhaps the term > > > > "default graph" is being used in an unusual way in this > > paragraph, > > > > to mean "the graph that will used in the absence of USING > > or USING > > > > NAMED"? I think it would be misleading to call that a "default > > > > graph". Normally the term "default graph" refers to the unnamed > > > > slot in a Graph Store, per the first paragraph in section > > 2. I think it would be best to use the term only in that way. > > > > > > Unfortunately, the term "default graph" has two accepted > > meanings. The > > > first is the graph that may be referred to without a name > > in a graph > > > store (not necessarily an unnamed graph), while the second > > refers to > > > the the graph that is referenced in a SPARQL WHERE clause when no > > > GRAPH block has been specified. By default, these two are > > equivalent, > > > but the latter is modified to be the merge of all graphs listed in > > > FROM clauses in a query (USING in updates) or by specifying a > > > default-graph-uri parameter in the SPARQL protocol. > > > > > > We have changed the text to the following to clarify the > > use of WITH: > > > > > > "That is, the GroupGraphPattern in the WHERE clause will be matched > > > against the dataset described by explicit USING or USING NAMED > > > clauses, if specified, and against the default graph > > provided by the > > > Graph Store otherwise. > > > > > > The WITH clause provides a convenience for when an > > operation primarily > > > refers to a single graph. If a graph name is specified in a WITH > > > clause, then - for the purposes of evaluating the WHERE > > clause - this > > > will define a dataset containing a default graph with the specified > > > name, but only in the absence of USING or USING NAMED > > clauses. In the > > > presence of one or more graphs referred to in USING clauses and/or > > > USING NAMED clauses, the WITH clause will be ignored while > > evaluating > > > the WHERE clause." > > > > > > > > > > Part of the confusion may be related to the ambiguous use of the > > > > term "dataset". For example, consider the sentence: > > "That is, the > > > > GroupGraphPattern in the WHERE clause will be matched against the > > > > dataset described by . . . ". When I read this, I took the term > > > > "dataset" to mean: > > > > http://en.wikipedia.org/wiki/Data_set > > > > However, I am wondering if you actually meant "RDF Dataset" as > > > > defined > > > > here: > > > > http://www.w3.org/TR/sparql11-query/#rdfDataset > > > > If you meant the former, I suggest using the term "set of > > data", to > > > > avoid ambiguity. If you meant the latter, I suggest > > using the term > > > > "RDF Dataset", and perhaps linking it to its definition. > > > > > > > > Also, I notice that: > > > > > > > > - There are many occurrences of the unqualified word > > "dataset". I > > > > suggest checking them all, to see if they should be "RDF Dataset". > > > > > > Existing documentation from SPARQL 1.0 already uses the both term > > > "dataset" as an abbreviation for "RDF dataset", so we do > > not feel that > > > it is necessary to use the complete term on every occasion. > > However, > > > we have expanded the term each time that a paragraph first uses it. > > > Despite a link to "Querying the Dataset" already being > > present in the > > > preceding paragraph we have added the requested link. > > > > > > > > > > - Capitalization of the terms "RDF Dataset" and "Graph Store" is > > > > inconsistent -- sometimes written "RDF dataset" or "graph > > store". > > > > It would help if it were consistently capitalized, as it > > helps the > > > > reader know that you are intending a specially defined term. > > > > > > "RDF dataset" was consistently capitalized in the prose, however it > > > has been updated to include a capitalized "D" to help the reader > > > realize that it is a formal term. The abbreviated term > > "dataset" has > > > remained unchanged. "Graph Store" has been updated. > > > > > > > > > > If I have understood the intent, it sounds like there are > > two sets > > > > of data involved in a DELETE/INSERT operation: one set is used in > > > > evaluating the WHERE clause, and the other is the target graph of > > > > the DELETE/INSERT, i.e., the graph that will be modified > > by the operation. > > > > If so, I think it would be helpful to state this up > > front, and make > > > > up a term for each of these sets, such as: "the set of > > data for the > > > > WHERE clause" and "the target graph". Hmm, maybe the SPARQL 1.1 > > > > Query spec uses the term "active graph" for the former? > > > > http://www.w3.org/TR/sparql11-query/#rdfDataset > > > > In any case, it would be helpful to define specific terms > > for these, > > > > and use them consistently. > > > > > > The terms "RDF dataset" and dataset are now used in this > > text entirely > > > in the context of the data that the WHERE clause will be matched > > > against. DELETE and INSERT may each refer to multiple > > graphs, making a > > > term like "target graph" difficult to manage. The changes > > made to this > > > section may now address some of the confusion being posed here. > > > > > > > > > > Also, it may be clearer to reword this paragraph as a > > decision tree, > > > > since the logic that is being described is a bit complex for > > > > unstructured English prose: > > > > > > > > If ___ then ___ . Otherwise, if ___ then ___ . Otherwise ___ . > > > > > > The purpose of this section of text is to provide a description in > > > prose. We hope that the changes have made the text clearer. > > > > Thank you. I am satisfied with this resolution. > > > > > > > > > > > > 3. In searching for the definition of the backslash "\" symbol in > > > > section 4.2, it looks like it is supposed to be set > > difference, but > > > > I do not see it listed in either of these tables of standard > > > > mathematical or logic symbols: > > > > http://en.wikipedia.org/wiki/List_of_mathematical_symbols > > > > http://en.wikipedia.org/wiki/Table_of_logic_symbols > > > > However, I now see that that is because it is using a different > > > > unicode character, so a browser search did not find it: > > > > http://en.wikipedia.org/wiki/List_of_mathematical_symbols > > > > I suggest adding a brief note of clarification to section 4.2 > > > > stating that the backslash symbol ("\") indicates set > > difference. > > > > Personally, I prefer the minus sign ("-") for set > > difference, though > > > > my tastes may be biased toward certain programming languages. > > > > > > The character "\" has been replaced with the word "minus", and text > > > has been provided to explain that this refers to "set difference". > > > > > > > > > > 4. The difference between "USING" and "USING NAMED" is not > > > > explained, except in passing: "This describes a dataset > > in a manner > > > > similar to FROM and FROM NAMED clauses in the SPARQL1.1 > > Query Language." > > > > > > We have replaced the phrase: "in a manner similar to FROM and FROM > > > NAMED" with: "in the same way as FROM and FROM NAMED" and have > > > provided a direct link to > > > http://www.w3.org/TR/sparql11-query/#specifyingDataset > > > > Thank you. I am satisfied with this resolution. > > > > > > > > > > > > 5. As written, this in sec 3.1: > > > > http://www.w3.org/TR/sparql11-update/#graphUpdate > > > > [[ > > > > Graph update operations change existing graphs in the Graph Store > > > > but do not explicitly delete nor create them. Non-empty > > inserts into > > > > non-existing graphs will, however, implicitly create > > those graphs, > > > > i.e., an implementation *should* create graphs that do not exist > > > > before triples were inserted into them (there may be > > implementations > > > > providing an update service over a fixed set of graphs > > which in such > > > > case *must* return with failure for update requests that would > > > > create an unallowed graph), and *may* remove graphs that are left > > > > empty after triples are removed from them. > > > > ]] > > > > seems to say that an implementation that operates over a > > *variable* > > > > (non-fixed) set of graphs still has the option of not > > automatically > > > > creating graphs that do not exist. > > > > > > > > I suggest rewording the above portion as: > > > > [[ > > > > Graph update operations change existing graphs in the Graph Store > > > > but do not explicitly delete nor create them. Non-empty > > inserts into > > > > non-existing graphs will normally implicitly create those graphs, > > > > i.e., an implementation fulfilling an update request *should* > > > > silently and automatically create graphs that do not exist before > > > > triples are inserted into them, and *must* return with > > failure if it > > > > fails to do so for any reason. (For example, the > > implementation may > > > > have insufficient resources, or an implementation may > > only provide > > > > an update service over a fixed set of graphs.) An implementation > > > > *may* remove graphs that are left empty after triples are > > removed from them. > > > > ]] > > > > > > Done, with minor changes: > > > > > > "Graph update operations change existing graphs in the > > Graph Store but > > > do not explicitly delete nor create them. Non-empty inserts into > > > non-existing graphs will, however, implicitly create those graphs, > > > i.e., an implementation fulfilling an update request should > > silently > > > an automatically create graphs that do not exist before triples are > > > > s/an /and / > > > > > inserted into them, and must return with failure if it > > fails to do so > > > for any reason. (For example, the implementation may have > > insufficient > > > resources, or an implementation may only provide an update service > > > over a fixed set of graphs and the implicitly created graph is not > > > within this fixed set). An implementation may remove graphs > > that are > > > left empty after triples are removed from them." > > > > Thank you. Subject to correcting the tiny typo above, I am > > satisfied with this resolution. > > > > > > > > > > > > 6. Similarly, I suggest rewording the following in section 3.1.1: > > > > http://www.w3.org/TR/sparql11-update/#insertData > > > > [[ > > > > If no graph is described in the QuadData, then the > > default graph is > > > > presumed. If data is inserted into a graph that does not exist in > > > > the graph store, it *should* be created (there may be > > > > implementations providing an update service over a fixed set of > > > > graphs which in such case *must* return with failure for update > > > > requests that insert data into an unallowed graph). > > > > ]] > > > > to: > > > > [[ > > > > If no graph is described in the QuadData, then the > > default graph is > > > > presumed. If data is inserted into a graph that does not > > exist in > > > > the graph store, the update service SHOULD create that > > graph. The > > > > service MUST return with failure if it fails to do so for > > any reason. > > > > ]] > > > > > > Done, with minor modification. The text now reads as: > > > > > > "The information how a graph store is accessed is defined in the > > > protocol and graph store protocol specs. A graph store is > > accessible > > > by either an update service (cf. protocol) or via the graph store > > > protocol (cf. graph store protocol). In either case the > > graph store is > > > hidden behind the service, making it accessible via the URI of a > > > SPARQL update service or via a URI that responds to the graph store > > > protocol." > > > > Thank you. I am satisfied with this resolution. > > > > > > > > > > > > 7. And similarly in section 3.1.3 I suggest changing: > > > > http://www.w3.org/TR/sparql11-update/#deleteInsert > > > > [[ > > > > If an operation tries to insert into a graph that does not exist, > > > > then the update service *should* create that graph. The service > > > > MUST return with failure if it fails to do so for any > > reason. If no > > > > data is to be inserted, then no graph will be created, even if > > > > applying the operation to a different dataset would > > result in data being inserted. > > > > ]] > > > > to: > > > > [[ > > > > If an operation tries to insert into a graph that does not exist, > > > > then that graph should be created; again, there may be > > > > implementations providing an update service over a fixed set of > > > > graphs which in such case must return with failure for update > > > > requests that would create an unallowed graph. If no data > > is to be > > > > inserted, then no graph will be created, even if applying the > > > > operation to a different dataset would result in data > > being inserted. > > > > ]] > > > > > > Done. > > > > Thank you. I am satisfied with this resolution. > > > > > > > > > > > > 8. How is the URI of a Graph Store indicated? The concept of a > > > > Graph Store is central to the SPARQL 1.1 Update spec, and > > hence one > > > > should be able to use a URI to refer to a particular Graph Store, > > > > but the spec doesn't say how this is done. > > > > > > > > The SPARQL 1.1 Service Description spec contains no sd:GraphStore > > > > class. > > > > > > > > The SPARQL 1.1 Graph Store HTTP Protocol spec sometimes > > mentions a > > > > Graph Store, but does not make clear how the intended > > Graph Store is > > > > identified. It does say: "A compliant implementation of this > > > > specification SHOULD accept HTTP requests directed at its > > Graph Store". > > > > But what if a service hosts multiple Graph Stores? > > > > > > > > According to > > > > http://www.w3.org/TR/sparql11-update/#graphStore > > > > a Graph Store "is a mutable container of RDF graphs managed by a > > > > single service" which "contains one (unnamed) slot > > holding a default > > > > graph and zero or more named slots holding named graphs". > > > > > > > > Language in section 2.1 > > > > http://www.w3.org/TR/sparql11-update/#graphStoreQueryServices > > > > "There is no presumption that the graph store managed by > > an update > > > > service . . . " suggests that an update service can only > > have *one* > > > > Graph Store, but: (a) I do not see this stated explicitly > > anywhere; > > > > (b) it would be useful for an update service to be able > > to have more > > > > than one Graph Store; and (b) what is the point of defining the > > > > notion of an "update service" if it is one-to-one with a Graph > > > > Store? AFAICT, doing so just adds an unnecessarily layer > > and confusion. > > > > > > > > The SPARQL 1.1 Service Description spec does define the > > notion of an > > > > sd:DataSet, which is close to the notion of a Graph > > Store, but (if I > > > > understand the definition of Graph Store in > > > > http://www.w3.org/TR/sparql11-update/#graphStore ) a > > Graph Store is > > > > mutable, whereas an sd:DataSet is not. > > > > > > Graph stores are referred to by URI, but beyond this the > > > implementation is free to choose. This has been left unspecified > > > intentionally to allow each implementation to specify the details > > > individually. > > > > > > > > > > The reason one would want to have an update service that contains > > > > more than one Graph Store is that it would allow operations on > > > > collections of graphs to be performed efficiently. For > > example, an > > > > RDF data pipeline may need to generate one collection of > > graphs from > > > > another, all within the same update service. In other words, the > > > > content of one Graph Store is generated from the content > > of another > > > > Graph Store. This is important because for efficiency, it is > > > > helpful to be able to subdivide large graphs into collections of > > > > smaller graphs. An example might be a collection of > > 200,000 patient > > > > graphs. There may be *multiple* collections of these patient > > > > graphs, A, B and C, where collection C is derived from > > collection B > > > > which is derived from collection A in a pipeline. Since each > > > > patient graph within each of these collections is relatively > > > > independent, it is far more efficient when one in A is updated to > > > > only update the corresponding graphs in B and C, rather than > > > > regenerating the entire B and C collections. It would be very > > > > convenient if each of these collections could be stored in a > > > > sd:GraphStore (presuming such a class is defined) within the same > > > > update service so that appropriate update operations could be > > > > selectively performed on them, with the assurance (for > > efficiency) that they are within the same update service. > > > > > > > > Oddly, there is a distinction between a Graph Store (which is > > > > mutable) and an RDF Dataset (which is not), but there is no > > > > corresponding distinction made with graphs. They are treated as > > > > mutable in the SPARQL > > > > 1.1 Update spec: they can be the subject of an INSERT or DELETE > > > > operation. > > > > > > > > Actually, in reading the definition of RDF Dataset > > > > http://www.w3.org/TR/sparql11-query/#rdfDataset > > > > I do not see anything that would prevent it from changing > > over time. > > > > Certainly an RDF Dataset contains a particular set of > > graphs at the > > > > moment when it is queried, but I see no prohibition against that > > > > same RDF Dataset containing a different set of graphs at > > a different time. > > > > Hence, it looks to me like the notion of Graph Store could be > > > > dropped in favor of using the term "RDF Datastore" universally > > > > throughout both the Query and Update documents. I think > > this would > > > > make more sense than using two different terms: both queries and > > > > updates would operate on RDF Datasets. > > > > > > While queries operate on a dataset that is defined as a merge of > > > multiple graphs, any updates must necessarily modify a > > single graph at > > > a time. So it is not possible to state that updates operate on RDF > > > Datasets. > > > > I apologize, I appear to have made a typo in my suggestion -- > > I wrote "RDF Datastore" instead of "RDF Dataset -- and I > > think this may have caused my suggestion to be misunderstood. > > I *meant* to suggest that the term "RDF Dataset" be used > > uniformly instead of using the term "Graph Store" in the > > Update spec and "RDF Dataset" in the Query spec. > > > > An update would operate on a specific graph *within* an RDF > > Dataset, just as it operates on a specific graph *within* a > > Graph Store. > > > > This is purely an editorial suggestion, to use a single term > > instead of two terms. > > > > > > > > While a single INSERT or DELETE template may refer to > > multiple graphs, > > > the triples being specified are always for individual graphs. So to > > > remove the same triples from graphs <foo> and <bar> there > > is no way to > > > do it with a single pattern in a template, but rather both > > graphs must > > > be mentioned explicitly with that template. ie.: > > > > > > DELETE { GRAPH <foo> { ... } GRAPH <bar> { ... }} ... > > > > > > > > > > 9. Typo: s/needs not be authoritative/need not be authoritative/ > > > > > > Done. > > > > Thank you. I am satisfied with this resolution. > > > > David > > > > > > > > > > > We would be grateful if you would acknowledge that your comment has > > > been answered by sending a reply to this mailing list. > > > > > > Paul Gearon, > > > on behalf of the SPARQL WG > > > > > > > > > > -- > > David Booth, Ph.D. > > http://dbooth.org/ > > > > Opinions expressed herein are those of the author and do not > > necessarily reflect those of his employer. > > > > > > > > -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Received on Wednesday, 1 August 2012 14:04:16 UTC