Review of SPARQL 1.1 Update from Gregory Williams on 2011-04-17 (public-rdf-dawg@w3.org from April to June 2011)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Sun, 17 Apr 2011 15:15:12 -0400
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <930C80D5-7442-4748-89EE-DB7DEA271359@evilfunhouse.com>

Below is my review of the Update document. I think there are some major issues in the formal model that need to be addressed before publication. Most of the rest of the issues are editorial in nature.

thanks,
.greg

Abstract
========

"Operations are provided to change existing RDF graphs". Is this consistent with the WGs current position on whether graphs are immutable?

Status
======

Links to Federation and JSON documents currently 404 (because they haven't been published as WDs yet). Make sure the links/shortnames are correct before publication.

1. Introduction
===============

"Update provides the following facilities". Should this list also include the shortcut operations for copying and renaming graphs?

"these form an alternative to the SPARQL 1.1 Uniform HTTP Protocol for Managing RDF graphs". This should use the new name "Graph Store HTTP Protocol"

1.1.1 Langusage Form
--------------------

"Any discrepencies between the language forms in this document and the grammar in SPARQL 1.1 Query will defer to the formal grammar in SPARQL 1.1 Query." Are there any such discrepancies? Or is this just protection against people finding errors in the spec in the future?

s/discrepencies/discrepancies/

"Unlike other forms of EBNF". Is this meant to be EBNF? No mention of this is made, and EBNF is not defined (I'm not sure if it needs to be).

"we occasionally refer to productions by links." Why only some of the productions?

"Examples are shown as follows". As this is only one type of example used in the document (c.f. example data), perhaps this should be "Example updates" or something similar.

"Data is shown as follows". Is the data always in Turtle? This should be mentioned.

1.1.2 Terminology
-----------------

"must, must not, should, should not, may and recommended". I notice that the styling of the RFC2119 terms is different between Update, Query, Service Description (and perhaps others). The linking style to RFC2119 (and other links throughout the document) is different from that used in at least the Service Description document. We should consider aligning the styling if it's not too difficult (it may be that the linking style is a job for me in the SD doc and not an issue for Update).

1.1.2 Terminology
-----------------

"expressible as a single command". Does "command" need to be defined?

"as defined in the SPARQL 1.1 Query Language". Needs a colon appended.

2 The Graph Store
=================

"Depending on implementation, the unnamed graph may refer to a separate graph, or it could be a representation of a union of other graphs." The discussion of only these two examples makes me uncomfortable. While I think it would be strange, I don't think there would be a problem with the unnamed graph being something other than these two options (e.g. the graph merge of other named graphs), and so discussing only these two strikes me as overly confined. Is there a way to reword this to give the graph union case as merely an example of a set of other implementations options?

2.1 Graph Store and SPARQL Query Services
-----------------------------------------

"endpoint" is referred to several times in the document, but never defined. The protocol document includes a definition of "SPARQL endpoint." Consider linking to that definition.

"If an update service is managing some graph store". Can an update service *not* manage some graph store? Is this only the degenerate case of an update service that rejects all requests?

2.2 SPARQL 1.1 Update Services
------------------------------

"Each request should be treated atomically by a SPARQL 1.1 Update service." I'm not sure about the use of "treated" here. The intention is suggesting that the *execution* of a request should be performed atomically, correct?

"However, using SERVICE in the WHERE clause of an Update request does not guarantee atomicity." I'm not sure what this means. Can this be clarified?

s/syncronisation/synchronization/

2.3 Entailment and Consistency
------------------------------

"If the store is capable of calculating entailed statements, c.f. SPARQL 1.1 Entailment Regimes". Should be "see" instead of "c.f.", and then probably within parentheses.

"not being affected of deletions". s/of/by/.

"If inconsistency is detected, the store should raise an exception." I read this as suggesting that the effect of the operation that caused the inconsistency is leaving the store in a permanently inconsistent state. If possible, wouldn't it be preferable for the *update service* to raise an exception, and leave the underlying store in a consistent state without the effects of the bad request?

3 SPARQL 1.1 Update Language
============================

"graph uptate". s/uptate/update/. (In general, the document could benefit from spell checking.)

"A request is a sequence of operations and is terminated by EOF (End of File). Multiple operations are separated by a ';' (semicolon) character." Based on the definition of "request" in section 1.1.2, I was under the impression that it was a conceptual entity consisting of operations, but this text seems to discuss it in syntax terms. If it's a syntactic entity, then the definition in 1.1.2 should be changed, and discuss requests as byte strings that are in the language defined by the SPARQL Update grammar.

"The operations of a request are executed in lexical order." Must they be? So long as the semantics remain unchanged, surely an implementation could order the execution of operations in any way it wanted? Maybe this should use "SHOULD be executed" (as per rfc2119) instead of "are executed"?

"Operations all result either in success or failure. A failure result may be accompanied by extra information, indicating that some portion of the operation was successful." These two sentences seem contradictory to me. I understand the intention as saying the *request* has only one return value, either success or failure, which may be accompanied by extra information indicating that some portion of the operations were successful.

3.1 Graph Update
----------------

"an implementation MUST create graphs that do not exist before triples were inserted into them". The use of "MUST" here seems too strong to me, as it would prevent conformant implementations from providing an update service over a fixed set of graphs. Is this "MUST" overridden by a service's ability to simply reject a request if it might trigger the illegal creation of a graph?

The bullet list of update operations should be introduced in some way (e.g. "SPARQL 1.1 Update provides these graph update operations:"). Also, consider adding hyperlinks from these summaries of the update operations to their definitions later in the document.

"triple templates containing variables within DELETE DATA and INSERT DATA operations don't have effects". Why is this not simply a syntax error?

"Having specific operations means that a request can be streamed". Suggest "specific operations for concrete data".

3.1.1 INSERT DATA
-----------------

"Insert triples into graphs:". This seems like a very casual way to introduce the operation. Suggest the content should be more similar to the summary in 3.1.

"occasionally wrapped into a GRAPH block". "occasionally" seems very weak. Perhaps "optionally" would be better?

"Variables in QuadDatas are ignored in INSERT DATA requests, i.e. TripleTemplates with Variables will not insert anything." Why is this more desirable than having this case result in a syntax error? Silently ignoring this case seems bad to me.

"ground triples". Section 3 has also used the term "concrete data", but neither is explicitly defined. Can these be defined (and possibly consolidated)?

"Blank node labels in QuadDatas are assumed to be disjoint from the blank nodes in the Graph Store and will be inserted as new blank nodes." I understand what this means, but I think it's misleading. The "blank node labels in QuadDatas" might refer to the syntactic-level labels, in which case they might *not* be disjoint. What's really being discussed here is the blank node labels used in the underlying insert operation. Is there a way to make this clearer?

"If a graph is described, but it does not exist, then it will be created." This language ("it will be") feels like it's describing an implementation, not a spec. Can this be cast in RFC2119 langauge?

"Adding some triples to a graph." This feels like the title of the example, but appears in the text as just a sentence fragment. Can it be changed either to "Example: Adding some triples to a graph", or be re-worded to be a full sentence?

"This snippet describes two RDF triples to be inserted into the default graph of the graph store." Up to this point, I believe that "default graph" has been used as a request-level concept, while the corresponding graph-store-level concept is "unnamed graph" (the two often being the same unless otherwise specified). If this is correct, then I think this should talk about inserting into the unnamed graph of the graph store.

I believe the use of a namespace prefix in this example is the first use of prefixes in the document. While it will make perfect sense if the reader is familiar with sparql query, perhaps some mention should be made of prefixes and their use in update requests, with any appropriate links to the query document.

"Data before". Again, I think this should refer to the unnamed graph, not the "Default graph". Similarly for the immediately following "Data after" section.

"Example 2". The previous example wasn't numbered, but this one is. Moreover, example numbering seems to reset in each section of the document. Could these be consistently numbered? Also, could the examples have link anchors added to them? I see that there are some link anchors already, but the anchor names aren't particularly meaningful, and at least one of them (#example_c) is used twice in the document.

In Example 2, is there a reason to use xsd:int instead of xsd:integer on the typed literal (or, even better, just using the turtle shorthand for integers)?

3.1.2 DELETE DATA
-----------------

"Delete triples from graphs:". Same comment as in 3.1.1 about casualness of the language.

"The QuadData denotes existing triples to be removed." Suggest s/The //. Also, can't QuadData contain triples that don't already exist? That is, they needn't all be "existing triples" for the operation to succeed.

"QuadDatas that contain variables will not match/delete any triples in DELETE DATA requests." Same comment about whether this is preferable to being a syntax error.

"Since blank node labels are only unique within each specific context". What is a "specific context"?

"blank nodes in the QuadData will not match existing data either in DELETE DATA requests." I'm not sure what "either" is meant to convey here. Also, I'd suggest using "will not *delete* existing data" (instead of 'matching' data), as the operation here is DELETE.

"Blank nodes are not permitted in the QuadData, as these cannot match any existing data. It should be noted that this restriction is not in the grammar for DELETE." This contradicts the text in the previous paragraph that blank nodes used in QuadData will simply not match any data. One of these must be changed.

"Removing undesired triples from a graph." Again, this seems like the title of the example.

Very minor point: why the style change from "book3" to "bookx"?

"Example 2". Some mention should be made of the use of multiple Prologues in this multi-operation request. Is the use of the second PREFIX declaration redundant? Required?

3.1.3 DELETE/INSERT
-------------------

This operation doesn't even have an introductory sentence, jumping instead directly into the EBNF of the operation. Suggest adding an introductory sentence.

"DELETE template", "INSERT template". The EBNF the QuadPattern rule, but there are no references to "templates". Perhaps "DELETE template" could be "DELETE QuadPattern" (similarly for "INSERT template")?

"The WITH iri". Is there a reason to use "iri" in a lowercase form?

"The USING <iri> and USING NAMED <iri> clauses affect the graphs and named graphs used in the WHERE clause." Suggest this talk about affecting the dataset used while evaluating the WHERE clause.

"The use of USING in this instance is to avoid possible ambiguity of where statements being DELETEd from." I find this sentence very confusing. Is the "where" meant to be "WHERE"? Or should it be "where statements *are* being DELETEd from"?

"If a USING clause appears, then this will override any effect that WITH may have on the WHERE clause." This seems to conflict with the following paragraph which says: "Any remaining portions of the GroupGraphPattern which are not in the scope of a GRAPH clause will be matched against the graph specified in the WITH clause, if present, or the default graph of the graph store otherwise." Also, this again talk about the "default graph" of the graph store, but I think that should be "unnamed graph".

"occasionally wrapped into a GRAPH block". Again, I don't think "occasionally" is helpful here.

"Using a new blank node in a delete template would lead to nothing being deleted, as the new blank node cannot match anything that already exists." Awkward tense on "would". Continuing reading, I see that it's meant as a hypothetical situation that cannot occur because blank nodes are prohibited in a delete template, but the wording feels awkward to me. Can the explanation of why it's prohibited be moved to after asserting that it is prohibited?

"If an operation tries to insert into a graph that does not exist, then that graph must be created." Again, my concern about the "MUST".

"Blank nodes that appear in an INSERT clause operate in the same way as blank nodes in the template of a CONSTRUCT query." I assume this is referring to being unique per solution that is applied to the template? I'm not sure this is explicit enough, because my first thought was that it was talking about blank nodes needing to be distinct from any blank nodes that are alread in the underlying graph.

"Example". This example isn't numbered (presumably because it's the only one in 3.1.3?). Same concern about the introductory sentence fragment being more of a title for the example.

The example query uses foaf:firstName, but the example data uses foaf:givenName.

The example turtle data (both before and after) are missing a trailing dot on the @prefixes. This also occurrs in example data later in the document.

3.1.4 DELETE (Informative)
--------------------------

"Example 1". Noteworthy that this is the first example section in which the first sentence is actually a sentence and not what seems like an example title. I think this is the sort of thing that all the example sections should begin with. The sentence should end with a period, though (or a colon if it's reworded to describe "this example" instead of "the example below").

The example foaf:mbox triples in this example data (and again in later examples throughout the document) should use mailto IRIs instead of literals.

The xsd:dateTime used in this example, as well as many others throughout the document use a timezone with an invalid lexical form with a one-digit timezone hour. Change "-2:00" to "-02:00".

"from the store's default graph." Should be be either "from the default graph" or "form the store's unnamed graph".

"The pattern in WHERE is matched against the graph store analogously to SPARQL 1.1 Query." Is it "analogous", or actually identical to the matching as per Query?

"If the pattern matching fails, no changes occur." Would this really be considered 'failing'? Or would it simply be pattern matching yielding a solution sequence of length zero?

"Data before". Again, I'm curious why there are 3 different styles for the book IRIs ('book3', 'book', and 'bookx').

"Example 2". Uses the sentence fragment example title style. Whichever predicate is used in the example in 3.1.3 (foaf:firstName, foaf:givenName) should be consistent with this example.

"A USING clause is present, meaning that the template also serves as the pattern to be matched in the http://example/addresses graph." I don't understand this sentence, particularly the use of "also". I would think that the presence of the USING clause would mean that the template/pattern *is to be matched* in the http://example/addresses graph.

3.1.5 INSERT (Informative)
--------------------------

"If no WHERE clause is present". I believe this should be "If no USING clause is present".

"This example copies records from one named graph to another named graph based on a pattern." Should this talk about triples instead of "records"? Also, suggest the sentence end with a colon.

"Example 2". Uses the sentence fragment form. The description talks about both "records" and "objects". Can the language be tightened up to talk about triples (that are copied) that describe objects (as the query deals with things of type "dcmitype:PhysicalObject")?

3.1.6 DELETE WHERE
------------------

"Example 1" and "Example 2". Uses the sentence fragment example title style.

"Find and remove statements naming something 'Fred' in the graph http://example.com/names, and also remove all statements about that resource from the graph http://example/addresses." I don't think this accurately describes what the example operation does. The QuadPattern, when used as a pattern for generating a solution sequence, will only match things named 'Fred' in the names graph that *also* appear as the subject of at least one triple in the 'addresses' graph. The description makes it sounds as if the data in the 'addresses' graph is an optional second step of the operation. If the <http://example.com/names> graph also contained this data:

<http://example/fred2> a foaf:Person .
<http://example/fred2> foaf:firstName "Fred" .

... but no data about <http://example/fred2> was present in the <http://example.com/addresses> graph, then those two triples would still be in the <http://example.com/names> graph after the operation completed. Also, I'm not sure why the this triple:

<http://example/fred> foaf:firstName "Fred" .

... remains in the 'names' graph after the operation. Shouldn't the DELETE operation have deleted it from the names graph along with the foaf:mbox triple from the 'addresses' graph?

3.1.7 LOAD
----------

"The LOAD operation copies all the triples from a remote graph into the specified graph." Is it really a remote graph that's being specified? Or rather a serialized graph (RDF document)? Perhaps it's the same thing, but I'd think of this operation as taking a document URI (which the EBNF actually calls it) and loading the triples you get when you parse the data you get when you dereference the URI.

"In case no RDF data can be retrieved (as opposed to the empty graph being retrieved) from documentIRI, the SPARQL 1.1 Update service is expected to return failure. In any other case, it will always return success." This says nothing about whether the retrieval was successful. What happens if you try to dereference the URI, and get back an HTTP error code along with an RDF payload? Shouldn't that raise an error?

3.1.8 CLEAR
-----------

This section is lacking an introductory sentence.

"The CLEAR operation removes all the triples in the specified graph." Consider s/graph/graph(s)/, since two of the options for CLEAR may actually end up clearing multiple graphs.

"has the same effect as". Is this normative? Can a graph store choose to remove the empty graph resulting from a CLEAR GRAPH operation, but keep it after a DELETE ... WHERE ... operation? If so, then these don't necessarily have the same effect.

3.2 Graph Management
--------------------

"Graph management operations allow to create, destroy, move and copy named graphs". Suggest "allow creating, destroying, moving, and copying named graphs".

The bullet list of graph management operations should be introduced in some way.

3.2.1 CREATE
------------

This section is lacking an introductory sentence.

"If the graph already exists, then a failure may be returned". Is this meant as a RFC2119-style "may"?

3.2.2 DROP
------------

This section is lacking an introductory sentence.

ISSUE-59 has been resolved. I'm not sure if that means the comment seeking implementor feedback should be removed.

This section should probably have a similar note as in CLEAR about the potentially far reaching implications of DROP DEFAULT on stores for which the default graph is the union of other graphs.

3.2.3 COPY
----------

"Data after". This example turtle is missing the @prefix declaration for foaf.

3.2.4 MOVE
----------

"Data after". This example turtle is missing the @prefix declaration for foaf.

3.2.5 ADD
---------

"Example". Uses the sentence fragment example title style.

4 SPARQL Update Formal Model
============================

It seems a bit odd that section 4 doesn't have any introductory text.

4.1.1 Graph Store
-----------------

s/i.e. i.e./i.e./

This section uses both "default" and "unnamed" to talk about the non-named graph in a graph store. It hasn't been clear to me throughout the document if these are meant to be interchangeable terms.

Both "<=" and "≠" are used in this section (and in later sections). Suggest both be either ascii or unicode characters, but not a mix.

4.1.2 Abstract Update Operation
-------------------------------

This is obviously subjective, but I'd prefer the definition of the transformation in the other direction "GS' = Op(GSt, Args)" (GS' on the left hand side).

I'm concerned that the text "can also alter the state of each graph individually" may be an issue for some people as I think it can be read as suggesting that the graphs are mutable (instead of the operation just replacing the graph with another one).

4.2.1 Dataset-UNION
-------------------

Is the definition used here for graph union different from the traditional one used in RDF Semantics? Does there need to be any discussion of handling of blank nodes? (This also applies to graph minus in the next section.)

4.2.2 Dataset-DIFF
------------------

There's a missing space between the first two sentences.

4.2.3 Dataset(QuadPattern, μ )
------------------------------

The previous two definitions didn't include the operation arguments in the section title. There's asymmetric whitespace in the argument list in the operation argument list in this section title.

Since this section talks about syntx-level QuadPatterns, does there need to be discussion of what to do with blank nodes in the quad patterns (regarding disjointness of blank nodes across invocations of Dataset())?

"combining these triples into a single named RDF graph by set union": I think "named" should be dropped here.

"Dataset(QuadPattern, μ ) = Dataset-UNION ( Dataset(QuadPattern1, μ ) , Dataset(QuadPattern2, μ ) )": The definition of Dataset-UNION in section 4.2.1 is defined as taking one graph store and one dataset as arguments, but here is used with two datasets.

4.2.4 Dataset(QuadPattern, P, GS )
----------------------------------

"For instance, the following update...": No connection has been discussed yet between the Dataset() operation and the SPARQL Update syntax, so this example feels a bit strange.

4.3.1 Insert Data Operation
---------------------------

This section (and similarly in later sections) talks about "Insert Data Operations," but links to the section in the document that talks about the syntax and effects of "INSERT DATA" (without using the "Operation" terminology).

"new triples are added in the Graph Store, either in the default slot or in a named slot": I'm a bit concerned about the idea of triples being added "in a slot". The definition gets it right, but this description is hiding that the underlying mechanism is that the triples are combined with those in an existing graph, and the new graph replaces the old one in the appropriate slot.

"where {} is the empty substitution": is it a "substitution," or a "solution mapping" (as the defintition of Dataset(QuadPattern, µ) says)?

4.3.2 Delete Data Operation
---------------------------

Similar concern about talking about removing triples "from a slot".
Similar concern about the "empty substitution".
(Both of these apply similarly to the other operations in setion 4.)

4.3.3 Delete Insert Operation
-----------------------------

"Triples are identified as they match a particular Group Graph Pattern P against GS." I'm not sure what this means. Also, there doesn't seem to be any connection between the syntax and the variables used in this section. I can infer what QuadPattern_INS, QuadPattern_DEL, and P are, but it would be much better if the connection with the syntax were explicit. This applies to the rest of the operations following this one, as well.

No mention is made of how USING clauses affect this operation. The operation is defined as matching against the graph store, not the dataset constructed with the USING clauses (or via the protocol, etc.). This is a major omission, and needs to be addressed.

4.3.4 Delete Operation
----------------------

If the definition of DELETE is merely informative (covered entirely by DELETE/INSERT), why does it need a definition in the formal model? If it doesn't (which I think is best), then the previous section needs to mention how DELETE/INSERT operations are handled when either the DELETE or INSERT parts are empty.

4.3.5 Insert Operation
----------------------

Same concern as for DELETE.

4.3.7 Load Operation
--------------------

This section should use the "dereference" terminology used in AWWW (the service description document uses this terminology as well).

"serializing it into (GRAPH iri) triples": I'm not sure what the parenthetical means.

This operation lacks a connection between syntax and operation arguments, but also any explicit discussion on the difference between the two operation forms (using an IRI or defaulting to the unnamed graph).

4.3.8 Clear Operation
---------------------

"OpClear(GS, iri) = GS if iri not in graphnames(GS)": covering the case where iri isn't in the graph names seems odd, since in general the error cases discussed in section 3 aren't dealt with in the formal model.

4.4.1 Create Operation
----------------------

In the note about non graph-aware stores, why should create operations be understood as followed by a drop operation instead of simply a no-op?

4.4.2 DropOperation
-------------------

"which is equivalent to OpClear_def, since the default graph cannot be removed": This seems to contradict the language in section 3.2.2 which talks about restoring the default graph after it is dropped. It also seems to conflict with text in the same sentence: "OpDrop_all for dropping all graphs including the default graph".

Received on Sunday, 17 April 2011 19:15:41 UTC