RE: SPARQL 1.1 Update - Comments from David Booth on 2012-08-01 (public-rdf-dawg-comments@w3.org from August 2012)

From: David Booth <david@dbooth.org>
Date: Wed, 01 Aug 2012 10:03:43 -0400
To: "Polleres, Axel" <axel.polleres@siemens.com>
Cc: "gearon@ieee.org" <gearon@ieee.org>, "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>
Message-ID: <1343829823.2725.82201.camel@dbooth-laptop>
Yes, that is the same idea under a different name.  I am satisfied with
this resolution.

Thanks!
David


On Wed, 2012-08-01 at 08:25 +0200, Polleres, Axel wrote:
> Dear David,
> 
> > Thank you.  I am satisfied with this resolution, providing
> > that "virtual graphs" is added to the wish list for
> > consideration in the next version of SPARQL:
> > http://www.w3.org/2009/sparql/wiki/Future_Work_Items
> 
> In order to not add to many new items to this list, please
> note that this page already has an explicit link to all the discussed
> features in the beginning of this WG and that were not adopted for
> work by the SPARQL 1.1 Working Group:
>  http://www.w3.org/2009/sparql/wiki/Category:Features
> 
> It seems to me that this your proposal of "vitrual graphs" is covered
> in the sense that
> it is a variant of what we had alredy noted under the feature name
> "Composite Datasets":
> http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets
> 
> Please let us know whether this addresses you concern.
> 
> Thanks,
> Axel
> 
> 
> --
> Dr. Axel Polleres
> Siemens AG Österreich
> Corporate Technology Central Eastern Europe Research & Technologies
> CT T CEE
> 
> Tel.: +43 (0) 51707-36983
> Mobile: +43 (0) 664 88550859
> Fax: +43 (0) 51707-56682 mailto:axel.polleres@siemens.com
> 
> 
> > -----Original Message-----
> > From: David Booth [mailto:david@dbooth.org]
> > Sent: Tuesday, 31 July 2012 5:54 PM
> > To: Paul Gearon
> > Cc: public-rdf-dawg-comments
> > Subject: Re: SPARQL 1.1 Update - Comments
> >
> > On Tue, 2012-07-31 at 10:25 -0400, Paul Gearon wrote:
> > > Hello David,
> > >
> > > Thank you for your comments. I apologise that this response
> > has been
> > > so long delayed. Please be assured that your comments were
> > addressed
> > > in the SPARQL Update document some time ago, though this formal
> > > response was stuck in the queue until now.
> > >
> > > We have addressed your concerns below.
> > >
> > > On Fri, Jul 29, 2011 at 2:54 PM, David Booth
> > <david@dbooth.org> wrote:
> > > > Regarding
> > > > http://www.w3.org/TR/2011/WD-sparql11-update-20110512/
> > > > It's great to see these documents in Last Call!
> > > >
> > > > Comments:
> > > >
> > > > 1. Please either add capability for virtual graphs or
> > keep the COPY,
> > > > ADD and MOVE shortcuts, to enable standard SPARQL to be used more
> > > > efficiently as a rules language and in data production
> > pipelines.
> > > > COPY, ADD and MOVE operations cost almost nothing to
> > implement, and
> > > > they help with efficiency.  By "virtual graph" I mean a
> > graph that
> > > > consists of the merge of a particular set of named graphs
> > -- a very
> > > > important capability for efficient data production pipelines.
> > >
> > > The features of COPY, ADD and MOVE were considered "At
> > Risk" until the
> > > working group was confident that they could be implemented without
> > > undue difficulty. Now that we have some reports of successful
> > > implementation, the "At Risk" designation has been removed.
> > >
> > > The group feels that adding a feature like "virtual graphs" at this
> > > late stage of publication is not possible.
> >
> > Thank you.  I am satisfied with this resolution, providing
> > that "virtual graphs" is added to the wish list for
> > consideration in the next version of SPARQL:
> > http://www.w3.org/2009/sparql/wiki/Future_Work_Items
> >
> >
> > >
> > >
> > > > 2. This paragraph in sec 3.1.3 is a bit confusing:
> > > > [[
> > > > That is, the GroupGraphPattern in the WHERE clause will
> > be matched
> > > > against the dataset described by explicit USING or USING NAMED
> > > > clauses, if specified, and against the graph store otherwise. Any
> > > > graph name specified in a WITH clause will - for evaluating the
> > > > WHERE clause - refer to the default graph to be used in
> > the absence
> > > > of USING or USING NAMED clauses. In the presence of one or more
> > > > graphs referred to in USING clauses, the default graph
> > will be the
> > > > merge of these graphs, meaning that the graph in a WITH
> > clause will
> > > > be ignored while evaluating the WHERE clause. If there is
> > no USING
> > > > clause, but there is one or more USING NAMED clauses, then the
> > > > dataset will include an empty graph for the default graph.
> > > > ]]
> > > > In particular, the sentence "Any graph name specified in a WITH
> > > > clause will - for evaluating the WHERE clause - refer to
> > the default
> > > > graph to be used in the absence of USING or USING NAMED clauses."
> > > > seems odd.  The graph specified in the WITH clause will
> > refer to the
> > > > *default* graph?  I would think it would be used *instead* of the
> > > > default graph.  Isn't that the point of WITH?  Perhaps the term
> > > > "default graph" is being used in an unusual way in this
> > paragraph,
> > > > to mean "the graph that will used in the absence of USING
> > or USING
> > > > NAMED"?  I think it would be misleading to call that a "default
> > > > graph".  Normally the term "default graph" refers to the unnamed
> > > > slot in a Graph Store, per the first paragraph in section
> > 2.  I think it would be best to use the term only in that way.
> > >
> > > Unfortunately, the term "default graph" has two accepted
> > meanings. The
> > > first is the graph that may be referred to without a name
> > in a graph
> > > store (not necessarily an unnamed graph), while the second
> > refers to
> > > the the graph that is referenced in a SPARQL WHERE clause when no
> > > GRAPH block has been specified. By default, these two are
> > equivalent,
> > > but the latter is modified to be the merge of all graphs listed in
> > > FROM clauses in a query (USING in updates) or by specifying a
> > > default-graph-uri parameter in the SPARQL protocol.
> > >
> > > We have changed the text to the following to clarify the
> > use of WITH:
> > >
> > > "That is, the GroupGraphPattern in the WHERE clause will be matched
> > > against the dataset described by explicit USING or USING NAMED
> > > clauses, if specified, and against the default graph
> > provided by the
> > > Graph Store otherwise.
> > >
> > > The WITH clause provides a convenience for when an
> > operation primarily
> > > refers to a single graph. If a graph name is specified in a WITH
> > > clause, then - for the purposes of evaluating the WHERE
> > clause - this
> > > will define a dataset containing a default graph with the specified
> > > name, but only in the absence of USING or USING NAMED
> > clauses. In the
> > > presence of one or more graphs referred to in USING clauses and/or
> > > USING NAMED clauses, the WITH clause will be ignored while
> > evaluating
> > > the WHERE clause."
> > >
> > >
> > > > Part of the confusion may be related to the ambiguous use of the
> > > > term "dataset".  For example, consider the sentence:
> > "That is, the
> > > > GroupGraphPattern in the WHERE clause will be matched against the
> > > > dataset described by . . . ".  When I read this, I took the term
> > > > "dataset" to mean:
> > > > http://en.wikipedia.org/wiki/Data_set
> > > > However, I am wondering if you actually meant "RDF Dataset" as
> > > > defined
> > > > here:
> > > > http://www.w3.org/TR/sparql11-query/#rdfDataset
> > > > If you meant the former, I suggest using the term "set of
> > data", to
> > > > avoid ambiguity.  If you meant the latter, I suggest
> > using the term
> > > > "RDF Dataset", and perhaps linking it to its definition.
> > > >
> > > > Also, I notice that:
> > > >
> > > > - There are many occurrences of the unqualified word
> > "dataset".  I
> > > > suggest checking them all, to see if they should be "RDF Dataset".
> > >
> > > Existing documentation from SPARQL 1.0 already uses the both term
> > > "dataset" as an abbreviation for "RDF dataset", so we do
> > not feel that
> > > it is necessary to use the complete term on every occasion.
> > However,
> > > we have expanded the term each time that a paragraph first uses it.
> > > Despite a link to "Querying the Dataset" already being
> > present in the
> > > preceding paragraph we have added the requested link.
> > >
> > >
> > > > - Capitalization of the terms "RDF Dataset" and "Graph Store" is
> > > > inconsistent -- sometimes written "RDF dataset" or "graph
> > store".
> > > > It would help if it were consistently capitalized, as it
> > helps the
> > > > reader know that you are intending a specially defined term.
> > >
> > > "RDF dataset" was consistently capitalized in the prose, however it
> > > has been updated to include a capitalized "D" to help the reader
> > > realize that it is a formal term. The abbreviated term
> > "dataset" has
> > > remained unchanged. "Graph Store" has been updated.
> > >
> > >
> > > > If I have understood the intent, it sounds like there are
> > two sets
> > > > of data involved in a DELETE/INSERT operation: one set is used in
> > > > evaluating the WHERE clause, and the other is the target graph of
> > > > the DELETE/INSERT, i.e., the graph that will be modified
> > by the operation.
> > > > If so, I think it would be helpful to state this up
> > front, and make
> > > > up a term for each of these sets, such as: "the set of
> > data for the
> > > > WHERE clause" and "the target graph".  Hmm, maybe the SPARQL 1.1
> > > > Query spec uses the term "active graph" for the former?
> > > > http://www.w3.org/TR/sparql11-query/#rdfDataset
> > > > In any case, it would be helpful to define specific terms
> > for these,
> > > > and use them consistently.
> > >
> > > The terms "RDF dataset" and dataset are now used in this
> > text entirely
> > > in the context of the data that the WHERE clause will be matched
> > > against. DELETE and INSERT may each refer to multiple
> > graphs, making a
> > > term like "target graph" difficult to manage. The changes
> > made to this
> > > section may now address some of the confusion being posed here.
> > >
> > >
> > > > Also, it may be clearer to reword this paragraph as a
> > decision tree,
> > > > since the logic that is being described is a bit complex for
> > > > unstructured English prose:
> > > >
> > > >   If ___ then ___ . Otherwise, if ___ then ___ . Otherwise ___ .
> > >
> > > The purpose of this section of text is to provide a description in
> > > prose. We hope that the changes have made the text clearer.
> >
> > Thank you.  I am satisfied with this resolution.
> >
> > >
> > >
> > > > 3. In searching for the definition of the backslash "\" symbol in
> > > > section 4.2, it looks like it is supposed to be set
> > difference, but
> > > > I do not see it listed in either of these tables of standard
> > > > mathematical or logic symbols:
> > > > http://en.wikipedia.org/wiki/List_of_mathematical_symbols
> > > > http://en.wikipedia.org/wiki/Table_of_logic_symbols
> > > > However, I now see that that is because it is using a different
> > > > unicode character, so a browser search did not find it:
> > > > http://en.wikipedia.org/wiki/List_of_mathematical_symbols
> > > > I suggest adding a brief note of clarification to section 4.2
> > > > stating that the backslash symbol ("\") indicates set
> > difference.
> > > > Personally, I prefer the minus sign ("-") for set
> > difference, though
> > > > my tastes may be biased toward certain programming languages.
> > >
> > > The character "\" has been replaced with the word "minus", and text
> > > has been provided to explain that this refers to "set difference".
> > >
> > >
> > > > 4. The difference between "USING" and "USING NAMED" is not
> > > > explained, except in passing: "This describes a dataset
> > in a manner
> > > > similar to FROM and FROM NAMED clauses in the SPARQL1.1
> > Query Language."
> > >
> > > We have replaced the phrase: "in a manner similar to FROM and FROM
> > > NAMED" with: "in the same way as FROM and FROM NAMED" and have
> > > provided a direct link to
> > > http://www.w3.org/TR/sparql11-query/#specifyingDataset
> >
> > Thank you.  I am satisfied with this resolution.
> >
> > >
> > >
> > > > 5. As written, this in sec 3.1:
> > > > http://www.w3.org/TR/sparql11-update/#graphUpdate
> > > > [[
> > > > Graph update operations change existing graphs in the Graph Store
> > > > but do not explicitly delete nor create them. Non-empty
> > inserts into
> > > > non-existing graphs will, however, implicitly create
> > those graphs,
> > > > i.e., an implementation *should* create graphs that do not exist
> > > > before triples were inserted into them (there may be
> > implementations
> > > > providing an update service over a fixed set of graphs
> > which in such
> > > > case *must* return with failure for update requests that would
> > > > create an unallowed graph), and *may* remove graphs that are left
> > > > empty after triples are removed from them.
> > > > ]]
> > > > seems to say that an implementation that operates over a
> > *variable*
> > > > (non-fixed) set of graphs still has the option of not
> > automatically
> > > > creating graphs that do not exist.
> > > >
> > > > I suggest rewording the above portion as:
> > > > [[
> > > > Graph update operations change existing graphs in the Graph Store
> > > > but do not explicitly delete nor create them. Non-empty
> > inserts into
> > > > non-existing graphs will normally implicitly create those graphs,
> > > > i.e., an implementation fulfilling an update request *should*
> > > > silently and automatically create graphs that do not exist before
> > > > triples are inserted into them, and *must* return with
> > failure if it
> > > > fails to do so for any reason.  (For example, the
> > implementation may
> > > > have insufficient resources, or an implementation may
> > only provide
> > > > an update service over a fixed set of graphs.)  An implementation
> > > > *may* remove graphs that are left empty after triples are
> > removed from them.
> > > > ]]
> > >
> > > Done, with minor changes:
> > >
> > > "Graph update operations change existing graphs in the
> > Graph Store but
> > > do not explicitly delete nor create them. Non-empty inserts into
> > > non-existing graphs will, however, implicitly create those graphs,
> > > i.e., an implementation fulfilling an update request should
> > silently
> > > an automatically create graphs that do not exist before triples are
> >
> > s/an /and /
> >
> > > inserted into them, and must return with failure if it
> > fails to do so
> > > for any reason. (For example, the implementation may have
> > insufficient
> > > resources, or an implementation may only provide an update service
> > > over a fixed set of graphs and the implicitly created graph is not
> > > within this fixed set). An implementation may remove graphs
> > that are
> > > left empty after triples are removed from them."
> >
> > Thank you.  Subject to correcting the tiny typo above, I am
> > satisfied with this resolution.
> >
> > >
> > >
> > > > 6. Similarly, I suggest rewording the following in section 3.1.1:
> > > > http://www.w3.org/TR/sparql11-update/#insertData
> > > > [[
> > > > If no graph is described in the QuadData, then the
> > default graph is
> > > > presumed. If data is inserted into a graph that does not exist in
> > > > the graph store, it *should* be created (there may be
> > > > implementations providing an update service over a fixed set of
> > > > graphs which in such case *must* return with failure for update
> > > > requests that insert data into an unallowed graph).
> > > > ]]
> > > > to:
> > > > [[
> > > > If no graph is described in the QuadData, then the
> > default graph is
> > > > presumed.  If data is inserted into a graph that does not
> > exist in
> > > > the graph store, the update service SHOULD create that
> > graph.  The
> > > > service MUST return with failure if it fails to do so for
> > any reason.
> > > > ]]
> > >
> > > Done, with minor modification. The text now reads as:
> > >
> > > "The information how a graph store is accessed is defined in the
> > > protocol and graph store protocol specs. A graph store is
> > accessible
> > > by either an update service (cf. protocol) or via the graph store
> > > protocol (cf. graph store protocol). In either case the
> > graph store is
> > > hidden behind the service, making it accessible via the URI of a
> > > SPARQL update service or via a URI that responds to the graph store
> > > protocol."
> >
> > Thank you.  I am satisfied with this resolution.
> >
> > >
> > >
> > > > 7. And similarly in section 3.1.3 I suggest changing:
> > > > http://www.w3.org/TR/sparql11-update/#deleteInsert
> > > > [[
> > > > If an operation tries to insert into a graph that does not exist,
> > > > then the update service *should* create that graph.  The service
> > > > MUST return with failure if it fails to do so for any
> > reason.  If no
> > > > data is to be inserted, then no graph will be created, even if
> > > > applying the operation to a different dataset would
> > result in data being inserted.
> > > > ]]
> > > > to:
> > > > [[
> > > > If an operation tries to insert into a graph that does not exist,
> > > > then that graph should be created; again, there may be
> > > > implementations providing an update service over a fixed set of
> > > > graphs which in such case must return with failure for update
> > > > requests that would create an unallowed graph. If no data
> > is to be
> > > > inserted, then no graph will be created, even if applying the
> > > > operation to a different dataset would result in data
> > being inserted.
> > > > ]]
> > >
> > > Done.
> >
> > Thank you.  I am satisfied with this resolution.
> >
> > >
> > >
> > > > 8. How is the URI of a Graph Store indicated?  The concept of a
> > > > Graph Store is central to the SPARQL 1.1 Update spec, and
> > hence one
> > > > should be able to use a URI to refer to a particular Graph Store,
> > > > but the spec doesn't say how this is done.
> > > >
> > > > The SPARQL 1.1 Service Description spec contains no sd:GraphStore
> > > > class.
> > > >
> > > > The SPARQL 1.1 Graph Store HTTP Protocol spec sometimes
> > mentions a
> > > > Graph Store, but does not make clear how the intended
> > Graph Store is
> > > > identified.  It does say: "A compliant implementation of this
> > > > specification SHOULD accept HTTP requests directed at its
> > Graph Store".
> > > > But what if a service hosts multiple Graph Stores?
> > > >
> > > > According to
> > > > http://www.w3.org/TR/sparql11-update/#graphStore
> > > > a Graph Store "is a mutable container of RDF graphs managed by a
> > > > single service" which "contains one (unnamed) slot
> > holding a default
> > > > graph and zero or more named slots holding named graphs".
> > > >
> > > > Language in section 2.1
> > > > http://www.w3.org/TR/sparql11-update/#graphStoreQueryServices
> > > > "There is no presumption that the graph store managed by
> > an update
> > > > service . . . " suggests that an update service can only
> > have *one*
> > > > Graph Store, but: (a) I do not see this stated explicitly
> > anywhere;
> > > > (b) it would be useful for an update service to be able
> > to have more
> > > > than one Graph Store; and (b) what is the point of defining the
> > > > notion of an "update service" if it is one-to-one with a Graph
> > > > Store?  AFAICT, doing so just adds an unnecessarily layer
> > and confusion.
> > > >
> > > > The SPARQL 1.1 Service Description spec does define the
> > notion of an
> > > > sd:DataSet, which is close to the notion of a Graph
> > Store, but (if I
> > > > understand the definition of Graph Store in
> > > > http://www.w3.org/TR/sparql11-update/#graphStore ) a
> > Graph Store is
> > > > mutable, whereas an sd:DataSet is not.
> > >
> > > Graph stores are referred to by URI, but beyond this the
> > > implementation is free to choose. This has been left unspecified
> > > intentionally to allow each implementation to specify the details
> > > individually.
> > >
> > >
> > > > The reason one would want to have an update service that contains
> > > > more than one Graph Store is that it would allow operations on
> > > > collections of graphs to be performed efficiently.  For
> > example, an
> > > > RDF data pipeline may need to generate one collection of
> > graphs from
> > > > another, all within the same update service.  In other words, the
> > > > content of one Graph Store is generated from the content
> > of another
> > > > Graph Store.  This is important because for efficiency, it is
> > > > helpful to be able to subdivide large graphs into collections of
> > > > smaller graphs.  An example might be a collection of
> > 200,000 patient
> > > > graphs.  There may be *multiple* collections of these patient
> > > > graphs, A, B and C, where collection C is derived from
> > collection B
> > > > which is derived from collection A in a pipeline.  Since each
> > > > patient graph within each of these collections is relatively
> > > > independent, it is far more efficient when one in A is updated to
> > > > only update the corresponding graphs in B and C, rather than
> > > > regenerating the entire B and C collections.  It would be very
> > > > convenient if each of these collections could be stored in a
> > > > sd:GraphStore (presuming such a class is defined) within the same
> > > > update service so that appropriate update operations could be
> > > > selectively performed on them, with the assurance (for
> > efficiency) that they are within the same update service.
> > > >
> > > > Oddly, there is a distinction between a Graph Store (which is
> > > > mutable) and an RDF Dataset (which is not), but there is no
> > > > corresponding distinction made with graphs.  They are treated as
> > > > mutable in the SPARQL
> > > > 1.1 Update spec: they can be the subject of an INSERT or DELETE
> > > > operation.
> > > >
> > > > Actually, in reading the definition of RDF Dataset
> > > > http://www.w3.org/TR/sparql11-query/#rdfDataset
> > > > I do not see anything that would prevent it from changing
> > over time.
> > > > Certainly an RDF Dataset contains a particular set of
> > graphs at the
> > > > moment when it is queried, but I see no prohibition against that
> > > > same RDF Dataset containing a different set of graphs at
> > a different time.
> > > > Hence, it looks to me like the notion of Graph Store could be
> > > > dropped in favor of using the term "RDF Datastore" universally
> > > > throughout both the Query and Update documents.  I think
> > this would
> > > > make more sense than using two different terms: both queries and
> > > > updates would operate on RDF Datasets.
> > >
> > > While queries operate on a dataset that is defined as a merge of
> > > multiple graphs, any updates must necessarily modify a
> > single graph at
> > > a time. So it is not possible to state that updates operate on RDF
> > > Datasets.
> >
> > I apologize, I appear to have made a typo in my suggestion --
> > I wrote "RDF Datastore" instead of "RDF Dataset -- and I
> > think this may have caused my suggestion to be misunderstood.
> >  I *meant* to suggest that the term "RDF Dataset" be used
> > uniformly instead of using the term "Graph Store" in the
> > Update spec and "RDF Dataset" in the Query spec.
> >
> > An update would operate on a specific graph *within* an RDF
> > Dataset, just as it operates on a specific graph *within* a
> > Graph Store.
> >
> > This is purely an editorial suggestion, to use a single term
> > instead of two terms.
> >
> > >
> > > While a single INSERT or DELETE template may refer to
> > multiple graphs,
> > > the triples being specified are always for individual graphs. So to
> > > remove the same triples from graphs <foo> and <bar> there
> > is no way to
> > > do it with a single pattern in a template, but rather both
> > graphs must
> > > be mentioned explicitly with that template. ie.:
> > >
> > > DELETE { GRAPH <foo> { ... } GRAPH <bar> { ... }} ...
> > >
> > >
> > > > 9. Typo: s/needs not be authoritative/need not be authoritative/
> > >
> > > Done.
> >
> > Thank you.  I am satisfied with this resolution.
> >
> > David
> >
> > >
> > >
> > > We would be grateful if you would acknowledge that your comment has
> > > been answered by sending a reply to this mailing list.
> > >
> > > Paul Gearon,
> > > on behalf of the SPARQL WG
> > >
> > >
> >
> > --
> > David Booth, Ph.D.
> > http://dbooth.org/
> >
> > Opinions expressed herein are those of the author and do not
> > necessarily reflect those of his employer.
> >
> >
> >
> 
> 

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Wednesday, 1 August 2012 14:04:16 UTC