RE: SPARQL 1.1 Update - Comments from Polleres, Axel on 2012-08-01 (public-rdf-dawg-comments@w3.org from August 2012)

From: Polleres, Axel <axel.polleres@siemens.com>
Date: Wed, 1 Aug 2012 08:25:27 +0200
To: "david@dbooth.org" <david@dbooth.org>, "gearon@ieee.org" <gearon@ieee.org>
CC: "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>
Message-ID: <9DA51FFE5E84464082D7A089342DEEE80141DC3C15CD@ATVIES9917WMSX.ww300.siemens.net>
Dear David,

> Thank you.  I am satisfied with this resolution, providing
> that "virtual graphs" is added to the wish list for
> consideration in the next version of SPARQL:
> http://www.w3.org/2009/sparql/wiki/Future_Work_Items

In order to not add to many new items to this list, please
note that this page already has an explicit link to all the discussed
features in the beginning of this WG and that were not adopted for
work by the SPARQL 1.1 Working Group:
 http://www.w3.org/2009/sparql/wiki/Category:Features

It seems to me that this your proposal of "vitrual graphs" is covered in the sense that
it is a variant of what we had alredy noted under the feature name "Composite Datasets": http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets

Please let us know whether this addresses you concern.

Thanks,
Axel


--
Dr. Axel Polleres
Siemens AG Österreich
Corporate Technology Central Eastern Europe Research & Technologies
CT T CEE

Tel.: +43 (0) 51707-36983
Mobile: +43 (0) 664 88550859
Fax: +43 (0) 51707-56682 mailto:axel.polleres@siemens.com


> -----Original Message-----
> From: David Booth [mailto:david@dbooth.org]
> Sent: Tuesday, 31 July 2012 5:54 PM
> To: Paul Gearon
> Cc: public-rdf-dawg-comments
> Subject: Re: SPARQL 1.1 Update - Comments
>
> On Tue, 2012-07-31 at 10:25 -0400, Paul Gearon wrote:
> > Hello David,
> >
> > Thank you for your comments. I apologise that this response
> has been
> > so long delayed. Please be assured that your comments were
> addressed
> > in the SPARQL Update document some time ago, though this formal
> > response was stuck in the queue until now.
> >
> > We have addressed your concerns below.
> >
> > On Fri, Jul 29, 2011 at 2:54 PM, David Booth
> <david@dbooth.org> wrote:
> > > Regarding
> > > http://www.w3.org/TR/2011/WD-sparql11-update-20110512/
> > > It's great to see these documents in Last Call!
> > >
> > > Comments:
> > >
> > > 1. Please either add capability for virtual graphs or
> keep the COPY,
> > > ADD and MOVE shortcuts, to enable standard SPARQL to be used more
> > > efficiently as a rules language and in data production
> pipelines.
> > > COPY, ADD and MOVE operations cost almost nothing to
> implement, and
> > > they help with efficiency.  By "virtual graph" I mean a
> graph that
> > > consists of the merge of a particular set of named graphs
> -- a very
> > > important capability for efficient data production pipelines.
> >
> > The features of COPY, ADD and MOVE were considered "At
> Risk" until the
> > working group was confident that they could be implemented without
> > undue difficulty. Now that we have some reports of successful
> > implementation, the "At Risk" designation has been removed.
> >
> > The group feels that adding a feature like "virtual graphs" at this
> > late stage of publication is not possible.
>
> Thank you.  I am satisfied with this resolution, providing
> that "virtual graphs" is added to the wish list for
> consideration in the next version of SPARQL:
> http://www.w3.org/2009/sparql/wiki/Future_Work_Items
>
>
> >
> >
> > > 2. This paragraph in sec 3.1.3 is a bit confusing:
> > > [[
> > > That is, the GroupGraphPattern in the WHERE clause will
> be matched
> > > against the dataset described by explicit USING or USING NAMED
> > > clauses, if specified, and against the graph store otherwise. Any
> > > graph name specified in a WITH clause will - for evaluating the
> > > WHERE clause - refer to the default graph to be used in
> the absence
> > > of USING or USING NAMED clauses. In the presence of one or more
> > > graphs referred to in USING clauses, the default graph
> will be the
> > > merge of these graphs, meaning that the graph in a WITH
> clause will
> > > be ignored while evaluating the WHERE clause. If there is
> no USING
> > > clause, but there is one or more USING NAMED clauses, then the
> > > dataset will include an empty graph for the default graph.
> > > ]]
> > > In particular, the sentence "Any graph name specified in a WITH
> > > clause will - for evaluating the WHERE clause - refer to
> the default
> > > graph to be used in the absence of USING or USING NAMED clauses."
> > > seems odd.  The graph specified in the WITH clause will
> refer to the
> > > *default* graph?  I would think it would be used *instead* of the
> > > default graph.  Isn't that the point of WITH?  Perhaps the term
> > > "default graph" is being used in an unusual way in this
> paragraph,
> > > to mean "the graph that will used in the absence of USING
> or USING
> > > NAMED"?  I think it would be misleading to call that a "default
> > > graph".  Normally the term "default graph" refers to the unnamed
> > > slot in a Graph Store, per the first paragraph in section
> 2.  I think it would be best to use the term only in that way.
> >
> > Unfortunately, the term "default graph" has two accepted
> meanings. The
> > first is the graph that may be referred to without a name
> in a graph
> > store (not necessarily an unnamed graph), while the second
> refers to
> > the the graph that is referenced in a SPARQL WHERE clause when no
> > GRAPH block has been specified. By default, these two are
> equivalent,
> > but the latter is modified to be the merge of all graphs listed in
> > FROM clauses in a query (USING in updates) or by specifying a
> > default-graph-uri parameter in the SPARQL protocol.
> >
> > We have changed the text to the following to clarify the
> use of WITH:
> >
> > "That is, the GroupGraphPattern in the WHERE clause will be matched
> > against the dataset described by explicit USING or USING NAMED
> > clauses, if specified, and against the default graph
> provided by the
> > Graph Store otherwise.
> >
> > The WITH clause provides a convenience for when an
> operation primarily
> > refers to a single graph. If a graph name is specified in a WITH
> > clause, then - for the purposes of evaluating the WHERE
> clause - this
> > will define a dataset containing a default graph with the specified
> > name, but only in the absence of USING or USING NAMED
> clauses. In the
> > presence of one or more graphs referred to in USING clauses and/or
> > USING NAMED clauses, the WITH clause will be ignored while
> evaluating
> > the WHERE clause."
> >
> >
> > > Part of the confusion may be related to the ambiguous use of the
> > > term "dataset".  For example, consider the sentence:
> "That is, the
> > > GroupGraphPattern in the WHERE clause will be matched against the
> > > dataset described by . . . ".  When I read this, I took the term
> > > "dataset" to mean:
> > > http://en.wikipedia.org/wiki/Data_set
> > > However, I am wondering if you actually meant "RDF Dataset" as
> > > defined
> > > here:
> > > http://www.w3.org/TR/sparql11-query/#rdfDataset
> > > If you meant the former, I suggest using the term "set of
> data", to
> > > avoid ambiguity.  If you meant the latter, I suggest
> using the term
> > > "RDF Dataset", and perhaps linking it to its definition.
> > >
> > > Also, I notice that:
> > >
> > > - There are many occurrences of the unqualified word
> "dataset".  I
> > > suggest checking them all, to see if they should be "RDF Dataset".
> >
> > Existing documentation from SPARQL 1.0 already uses the both term
> > "dataset" as an abbreviation for "RDF dataset", so we do
> not feel that
> > it is necessary to use the complete term on every occasion.
> However,
> > we have expanded the term each time that a paragraph first uses it.
> > Despite a link to "Querying the Dataset" already being
> present in the
> > preceding paragraph we have added the requested link.
> >
> >
> > > - Capitalization of the terms "RDF Dataset" and "Graph Store" is
> > > inconsistent -- sometimes written "RDF dataset" or "graph
> store".
> > > It would help if it were consistently capitalized, as it
> helps the
> > > reader know that you are intending a specially defined term.
> >
> > "RDF dataset" was consistently capitalized in the prose, however it
> > has been updated to include a capitalized "D" to help the reader
> > realize that it is a formal term. The abbreviated term
> "dataset" has
> > remained unchanged. "Graph Store" has been updated.
> >
> >
> > > If I have understood the intent, it sounds like there are
> two sets
> > > of data involved in a DELETE/INSERT operation: one set is used in
> > > evaluating the WHERE clause, and the other is the target graph of
> > > the DELETE/INSERT, i.e., the graph that will be modified
> by the operation.
> > > If so, I think it would be helpful to state this up
> front, and make
> > > up a term for each of these sets, such as: "the set of
> data for the
> > > WHERE clause" and "the target graph".  Hmm, maybe the SPARQL 1.1
> > > Query spec uses the term "active graph" for the former?
> > > http://www.w3.org/TR/sparql11-query/#rdfDataset
> > > In any case, it would be helpful to define specific terms
> for these,
> > > and use them consistently.
> >
> > The terms "RDF dataset" and dataset are now used in this
> text entirely
> > in the context of the data that the WHERE clause will be matched
> > against. DELETE and INSERT may each refer to multiple
> graphs, making a
> > term like "target graph" difficult to manage. The changes
> made to this
> > section may now address some of the confusion being posed here.
> >
> >
> > > Also, it may be clearer to reword this paragraph as a
> decision tree,
> > > since the logic that is being described is a bit complex for
> > > unstructured English prose:
> > >
> > >   If ___ then ___ . Otherwise, if ___ then ___ . Otherwise ___ .
> >
> > The purpose of this section of text is to provide a description in
> > prose. We hope that the changes have made the text clearer.
>
> Thank you.  I am satisfied with this resolution.
>
> >
> >
> > > 3. In searching for the definition of the backslash "\" symbol in
> > > section 4.2, it looks like it is supposed to be set
> difference, but
> > > I do not see it listed in either of these tables of standard
> > > mathematical or logic symbols:
> > > http://en.wikipedia.org/wiki/List_of_mathematical_symbols
> > > http://en.wikipedia.org/wiki/Table_of_logic_symbols
> > > However, I now see that that is because it is using a different
> > > unicode character, so a browser search did not find it:
> > > http://en.wikipedia.org/wiki/List_of_mathematical_symbols
> > > I suggest adding a brief note of clarification to section 4.2
> > > stating that the backslash symbol ("\") indicates set
> difference.
> > > Personally, I prefer the minus sign ("-") for set
> difference, though
> > > my tastes may be biased toward certain programming languages.
> >
> > The character "\" has been replaced with the word "minus", and text
> > has been provided to explain that this refers to "set difference".
> >
> >
> > > 4. The difference between "USING" and "USING NAMED" is not
> > > explained, except in passing: "This describes a dataset
> in a manner
> > > similar to FROM and FROM NAMED clauses in the SPARQL1.1
> Query Language."
> >
> > We have replaced the phrase: "in a manner similar to FROM and FROM
> > NAMED" with: "in the same way as FROM and FROM NAMED" and have
> > provided a direct link to
> > http://www.w3.org/TR/sparql11-query/#specifyingDataset
>
> Thank you.  I am satisfied with this resolution.
>
> >
> >
> > > 5. As written, this in sec 3.1:
> > > http://www.w3.org/TR/sparql11-update/#graphUpdate
> > > [[
> > > Graph update operations change existing graphs in the Graph Store
> > > but do not explicitly delete nor create them. Non-empty
> inserts into
> > > non-existing graphs will, however, implicitly create
> those graphs,
> > > i.e., an implementation *should* create graphs that do not exist
> > > before triples were inserted into them (there may be
> implementations
> > > providing an update service over a fixed set of graphs
> which in such
> > > case *must* return with failure for update requests that would
> > > create an unallowed graph), and *may* remove graphs that are left
> > > empty after triples are removed from them.
> > > ]]
> > > seems to say that an implementation that operates over a
> *variable*
> > > (non-fixed) set of graphs still has the option of not
> automatically
> > > creating graphs that do not exist.
> > >
> > > I suggest rewording the above portion as:
> > > [[
> > > Graph update operations change existing graphs in the Graph Store
> > > but do not explicitly delete nor create them. Non-empty
> inserts into
> > > non-existing graphs will normally implicitly create those graphs,
> > > i.e., an implementation fulfilling an update request *should*
> > > silently and automatically create graphs that do not exist before
> > > triples are inserted into them, and *must* return with
> failure if it
> > > fails to do so for any reason.  (For example, the
> implementation may
> > > have insufficient resources, or an implementation may
> only provide
> > > an update service over a fixed set of graphs.)  An implementation
> > > *may* remove graphs that are left empty after triples are
> removed from them.
> > > ]]
> >
> > Done, with minor changes:
> >
> > "Graph update operations change existing graphs in the
> Graph Store but
> > do not explicitly delete nor create them. Non-empty inserts into
> > non-existing graphs will, however, implicitly create those graphs,
> > i.e., an implementation fulfilling an update request should
> silently
> > an automatically create graphs that do not exist before triples are
>
> s/an /and /
>
> > inserted into them, and must return with failure if it
> fails to do so
> > for any reason. (For example, the implementation may have
> insufficient
> > resources, or an implementation may only provide an update service
> > over a fixed set of graphs and the implicitly created graph is not
> > within this fixed set). An implementation may remove graphs
> that are
> > left empty after triples are removed from them."
>
> Thank you.  Subject to correcting the tiny typo above, I am
> satisfied with this resolution.
>
> >
> >
> > > 6. Similarly, I suggest rewording the following in section 3.1.1:
> > > http://www.w3.org/TR/sparql11-update/#insertData
> > > [[
> > > If no graph is described in the QuadData, then the
> default graph is
> > > presumed. If data is inserted into a graph that does not exist in
> > > the graph store, it *should* be created (there may be
> > > implementations providing an update service over a fixed set of
> > > graphs which in such case *must* return with failure for update
> > > requests that insert data into an unallowed graph).
> > > ]]
> > > to:
> > > [[
> > > If no graph is described in the QuadData, then the
> default graph is
> > > presumed.  If data is inserted into a graph that does not
> exist in
> > > the graph store, the update service SHOULD create that
> graph.  The
> > > service MUST return with failure if it fails to do so for
> any reason.
> > > ]]
> >
> > Done, with minor modification. The text now reads as:
> >
> > "The information how a graph store is accessed is defined in the
> > protocol and graph store protocol specs. A graph store is
> accessible
> > by either an update service (cf. protocol) or via the graph store
> > protocol (cf. graph store protocol). In either case the
> graph store is
> > hidden behind the service, making it accessible via the URI of a
> > SPARQL update service or via a URI that responds to the graph store
> > protocol."
>
> Thank you.  I am satisfied with this resolution.
>
> >
> >
> > > 7. And similarly in section 3.1.3 I suggest changing:
> > > http://www.w3.org/TR/sparql11-update/#deleteInsert
> > > [[
> > > If an operation tries to insert into a graph that does not exist,
> > > then the update service *should* create that graph.  The service
> > > MUST return with failure if it fails to do so for any
> reason.  If no
> > > data is to be inserted, then no graph will be created, even if
> > > applying the operation to a different dataset would
> result in data being inserted.
> > > ]]
> > > to:
> > > [[
> > > If an operation tries to insert into a graph that does not exist,
> > > then that graph should be created; again, there may be
> > > implementations providing an update service over a fixed set of
> > > graphs which in such case must return with failure for update
> > > requests that would create an unallowed graph. If no data
> is to be
> > > inserted, then no graph will be created, even if applying the
> > > operation to a different dataset would result in data
> being inserted.
> > > ]]
> >
> > Done.
>
> Thank you.  I am satisfied with this resolution.
>
> >
> >
> > > 8. How is the URI of a Graph Store indicated?  The concept of a
> > > Graph Store is central to the SPARQL 1.1 Update spec, and
> hence one
> > > should be able to use a URI to refer to a particular Graph Store,
> > > but the spec doesn't say how this is done.
> > >
> > > The SPARQL 1.1 Service Description spec contains no sd:GraphStore
> > > class.
> > >
> > > The SPARQL 1.1 Graph Store HTTP Protocol spec sometimes
> mentions a
> > > Graph Store, but does not make clear how the intended
> Graph Store is
> > > identified.  It does say: "A compliant implementation of this
> > > specification SHOULD accept HTTP requests directed at its
> Graph Store".
> > > But what if a service hosts multiple Graph Stores?
> > >
> > > According to
> > > http://www.w3.org/TR/sparql11-update/#graphStore
> > > a Graph Store "is a mutable container of RDF graphs managed by a
> > > single service" which "contains one (unnamed) slot
> holding a default
> > > graph and zero or more named slots holding named graphs".
> > >
> > > Language in section 2.1
> > > http://www.w3.org/TR/sparql11-update/#graphStoreQueryServices
> > > "There is no presumption that the graph store managed by
> an update
> > > service . . . " suggests that an update service can only
> have *one*
> > > Graph Store, but: (a) I do not see this stated explicitly
> anywhere;
> > > (b) it would be useful for an update service to be able
> to have more
> > > than one Graph Store; and (b) what is the point of defining the
> > > notion of an "update service" if it is one-to-one with a Graph
> > > Store?  AFAICT, doing so just adds an unnecessarily layer
> and confusion.
> > >
> > > The SPARQL 1.1 Service Description spec does define the
> notion of an
> > > sd:DataSet, which is close to the notion of a Graph
> Store, but (if I
> > > understand the definition of Graph Store in
> > > http://www.w3.org/TR/sparql11-update/#graphStore ) a
> Graph Store is
> > > mutable, whereas an sd:DataSet is not.
> >
> > Graph stores are referred to by URI, but beyond this the
> > implementation is free to choose. This has been left unspecified
> > intentionally to allow each implementation to specify the details
> > individually.
> >
> >
> > > The reason one would want to have an update service that contains
> > > more than one Graph Store is that it would allow operations on
> > > collections of graphs to be performed efficiently.  For
> example, an
> > > RDF data pipeline may need to generate one collection of
> graphs from
> > > another, all within the same update service.  In other words, the
> > > content of one Graph Store is generated from the content
> of another
> > > Graph Store.  This is important because for efficiency, it is
> > > helpful to be able to subdivide large graphs into collections of
> > > smaller graphs.  An example might be a collection of
> 200,000 patient
> > > graphs.  There may be *multiple* collections of these patient
> > > graphs, A, B and C, where collection C is derived from
> collection B
> > > which is derived from collection A in a pipeline.  Since each
> > > patient graph within each of these collections is relatively
> > > independent, it is far more efficient when one in A is updated to
> > > only update the corresponding graphs in B and C, rather than
> > > regenerating the entire B and C collections.  It would be very
> > > convenient if each of these collections could be stored in a
> > > sd:GraphStore (presuming such a class is defined) within the same
> > > update service so that appropriate update operations could be
> > > selectively performed on them, with the assurance (for
> efficiency) that they are within the same update service.
> > >
> > > Oddly, there is a distinction between a Graph Store (which is
> > > mutable) and an RDF Dataset (which is not), but there is no
> > > corresponding distinction made with graphs.  They are treated as
> > > mutable in the SPARQL
> > > 1.1 Update spec: they can be the subject of an INSERT or DELETE
> > > operation.
> > >
> > > Actually, in reading the definition of RDF Dataset
> > > http://www.w3.org/TR/sparql11-query/#rdfDataset
> > > I do not see anything that would prevent it from changing
> over time.
> > > Certainly an RDF Dataset contains a particular set of
> graphs at the
> > > moment when it is queried, but I see no prohibition against that
> > > same RDF Dataset containing a different set of graphs at
> a different time.
> > > Hence, it looks to me like the notion of Graph Store could be
> > > dropped in favor of using the term "RDF Datastore" universally
> > > throughout both the Query and Update documents.  I think
> this would
> > > make more sense than using two different terms: both queries and
> > > updates would operate on RDF Datasets.
> >
> > While queries operate on a dataset that is defined as a merge of
> > multiple graphs, any updates must necessarily modify a
> single graph at
> > a time. So it is not possible to state that updates operate on RDF
> > Datasets.
>
> I apologize, I appear to have made a typo in my suggestion --
> I wrote "RDF Datastore" instead of "RDF Dataset -- and I
> think this may have caused my suggestion to be misunderstood.
>  I *meant* to suggest that the term "RDF Dataset" be used
> uniformly instead of using the term "Graph Store" in the
> Update spec and "RDF Dataset" in the Query spec.
>
> An update would operate on a specific graph *within* an RDF
> Dataset, just as it operates on a specific graph *within* a
> Graph Store.
>
> This is purely an editorial suggestion, to use a single term
> instead of two terms.
>
> >
> > While a single INSERT or DELETE template may refer to
> multiple graphs,
> > the triples being specified are always for individual graphs. So to
> > remove the same triples from graphs <foo> and <bar> there
> is no way to
> > do it with a single pattern in a template, but rather both
> graphs must
> > be mentioned explicitly with that template. ie.:
> >
> > DELETE { GRAPH <foo> { ... } GRAPH <bar> { ... }} ...
> >
> >
> > > 9. Typo: s/needs not be authoritative/need not be authoritative/
> >
> > Done.
>
> Thank you.  I am satisfied with this resolution.
>
> David
>
> >
> >
> > We would be grateful if you would acknowledge that your comment has
> > been answered by sending a reply to this mailing list.
> >
> > Paul Gearon,
> > on behalf of the SPARQL WG
> >
> >
>
> --
> David Booth, Ph.D.
> http://dbooth.org/
>
> Opinions expressed herein are those of the author and do not
> necessarily reflect those of his employer.
>
>
>
Received on Wednesday, 1 August 2012 06:27:22 UTC