Addressing Greg's update review - part 2... (Re: Review of SPARQL 1.1 Update) from Axel Polleres on 2011-04-26 (public-rdf-dawg@w3.org from April to June 2011)

From: Axel Polleres <axel.polleres@deri.org>
Date: Tue, 26 Apr 2011 04:13:13 +0100
To: Gregory Williams <greg@evilfunhouse.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <858D743E-EF1E-4379-A9FD-D186DB0D3637@deri.org>
This is part 2 of addressing Greg's comments. 
Continuing on the open points from Greg's comments numbering them from 12) onwards which is where I stopped in my last mail...
Not all of the open issues are major, but there are still some major open issues for Update, which I hope to 
at least summarise before tomorrow's call.


12)
> "If a USING clause appears, then this will override any effect that WITH may have on the WHERE clause."

This seemed wrong to me... I dropped that sentence. I don't see why USING couldn't be even used in combination with WITH, i.e. indicating the graph in the newly defined dataset...

> This seems to conflict with the following paragraph which says: "Any remaining portions of the GroupGraphPattern which are not in the scope of a GRAPH clause will be matched against the graph specified in the WITH clause, if present, or the default graph of the graph store otherwise." Also, this again talk about the "default graph" of the graph store, but I think that should be "unnamed graph".

... I rewrote those paragraphs significantly, please check if better now.


13)
> > "has the same effect as". Is this normative? Can a graph store choose to remove the empty graph resulting from a CLEAR GRAPH operation, but keep it after a DELETE ... WHERE ... operation? If so, then these don't necessarily have the same effect.

Hmmm, good remark. For the moment, I just weakened "has the same effect as" to "in principle has the same effect as:"
Alternative would be to just remove this "syntactic sugar explanation", but I find it potentially useful for illustration.

14)
> > 4.2.1 Dataset-UNION
> > -------------------
> >
> > Is the definition used here for graph union different from the traditional one used in RDF Semantics?

which one? this is set union, as opposed to rdf merge. I added (temporarily) a definition of Dataset-MERGE to make the difference clear. 
Can be removed, if not needed.

15)
> > Since this section talks about syntx-level QuadPatterns, does there need to be discussion of what to do with blank nodes in the quad patterns (regarding disjointness of blank nodes across invocations of Dataset())?

*Open: yes assuming you talk about Section Dataset(QuadPattern, μ ) here.


16)
> > "Dataset(QuadPattern, μ ) = Dataset-UNION ( Dataset(QuadPattern1, μ ) , Dataset(QuadPattern2, μ ) )": The definition of Dataset-UNION in section 4.2.1 is defined as taking one graph store and one dataset as arguments, but here is used with two datasets.

*Open: Hmmm, we also say that "i.e., the graph store can be viewed as a mutable RDF dataset" i.e. for the sake of these definitions, I don't really want to make a difference, to be honest, as I am afraid it would further complicate things. I was hoping this is clear enough. Do you consider this remark critical?

17)
> > 4.2.4 Dataset(QuadPattern, P, GS )
> > ----------------------------------
> >
> > "For instance, the following update...": No connection has been discussed yet between the Dataset() operation and the SPARQL Update syntax, so this example feels a bit strange.

*Open: I could add some more explanation on how this example results in the call of the Dataset() function, need to think about how to best do that.

18)
> > This section (and similarly in later sections) talks about "Insert Data Operations," but links to the section in the document that talks about the syntax and effects of "INSERT DATA" (without using the "Operation" terminology).

*Open: Yes, that connection from syntaxt to the operations still needs to be made, also remarked by Andy

19)
> > No mention is made of how USING clauses affect this operation. The operation is defined as matching against the graph store, not the dataset constructed with the USING clauses (or via the protocol, etc.). This is a major omission, and needs to be addressed.
> >

*Open: yes this is indeed a (known) major issue still open.


20)
> > If the definition of DELETE is merely informative (covered entirely by DELETE/INSERT), why does it need a definition in the formal model? If it doesn't (which I think is best), then the previous section needs to mention how DELETE/INSERT operations are handled when either the DELETE or INSERT parts are empty.

*Open: The behavior where the DELETE or INSERT parts are empty is implicit (by the definitions of Dataset()), but I think I finally agree that probably all the "Definitions" for Delete Operation/InsetrOperation and DeleteWehereOperation should be dropped and just what needs to be done is to define the mapping from syntax do those operations


21)
> > This section should use the "dereference" terminology used in AWWW (the service description document uses this terminology as well).
> >
> > "serializing it into (GRAPH iri) triples": I'm not sure what the parenthetical means.

*Open: tried to reword that, but not finished, would be grateful for concrete rewording suggestions regarding the terminology...


22)
> > "OpClear(GS, iri) = GS if iri not in graphnames(GS)": covering the case where iri isn't in the graph names seems odd, since in general the error cases discussed in section 3 aren't dealt with in the formal model.

*Open I kind of disagree: It is not clear to me from what I read in Section 3 so far that clearing a non-existing graph should yield an error, that's why I covered that case.
Should we mention it explicitly in Section 3?

-----


More details below...


> ---- this is how far I've got, need to continue here ----
> 
> >
> > "The WITH iri". Is there a reason to use "iri" in a lowercase form?
> 
Changed that to "WITH clause".

> > "The USING <iri> and USING NAMED <iri> clauses affect the graphs and named graphs used in the WHERE clause." Suggest this talk about affecting the dataset used while evaluating the WHERE clause.
> 

done. 

> > "The use of USING in this instance is to avoid possible ambiguity of where statements being DELETEd from." I find this sentence very confusing. Is the "where" meant to be "WHERE"? Or should it be "where statements *are* being DELETEd from"?

done. rewritten: 
 The keyword <code>USING</code> instead of <code>FROM</code>of in update requests is to avoid possible ambiguities which could arise from writing "<code>DELETE FROM</code>".

> > "If a USING clause appears, then this will override any effect that WITH may have on the WHERE clause."

This seemed wrong to me... I dropped that sentence. I don't see why USING couldn't be even used in combination with WITH, i.e. indicating the graph in the newly defined dataset...

Note that I also changed later on:

"
<p>Note that <code>WITH</code> will be ignored for any section that stipulates a <code>GRAPH</code> or for the entire <code>WHERE</code> clause if a <code>USING</code> clause is present. <code>WITH</code> provides a fallback to specify a graph (different from the default graph) to use when one is not explicitly stipulated via <code>GRAPH</code> or <code>USING</code>.</p>
"

to

"
<p>Note that explicit <code>GRAPH</code> clasues override a <code>WITH</code> clause. <code>WITH</code> provides a fallback to specify a graph (different from the default graph) to use when one is not explicitly stipulated via <code>GRAPH</code>.</p>
"

> This seems to conflict with the following paragraph which says: "Any remaining portions of the GroupGraphPattern which are not in the scope of a GRAPH clause will be matched against the graph specified in the WITH clause, if present, or the default graph of the graph store otherwise." Also, this again talk about the "default graph" of the graph store, but I think that should be "unnamed graph".

... I rewrote those paragraphs significantly, please check if better now.

> >
> > "occasionally wrapped into a GRAPH block". Again, I don't think "occasionally" is helpful here.
> 

changed to "optionally" 

> > "Using a new blank node in a delete template would lead to nothing being deleted, as the new blank node cannot match anything that already exists." Awkward tense on "would". Continuing reading, I see that it's meant as a hypothetical situation that cannot occur because blank nodes are prohibited in a delete template, but the wording feels awkward to me. Can the explanation of why it's prohibited be moved to after asserting that it is prohibited?

done.

> > "If an operation tries to insert into a graph that does not exist, then that graph must be created." Again, my concern about the "MUST".

done. reworded, see my earlier remark 7) before.

> > "Blank nodes that appear in an INSERT clause operate in the same way as blank nodes in the template of a CONSTRUCT query." I assume this is referring to being unique per solution that is applied to the template? I'm not sure this is explicit enough, because my first thought was that it was talking about blank nodes needing to be distinct from any blank nodes that are alread in the underlying graph.

done. reworded.
> >
> > "Example". This example isn't numbered (presumably because it's the only one in 3.1.3?). Same concern about the introductory sentence fragment being more of a title for the example.

done. all examples numbered consecutively throughout doc now.

> >
> > The example query uses foaf:firstName, but the example data uses foaf:givenName.
> >

done. fixed.

> > The example turtle data (both before and after) are missing a trailing dot on the @prefixes. This also occurrs in example data later in the document.

done. fixed. 

> > 3.1.4 DELETE (Informative)
> > --------------------------
> >
> > "Example 1". Noteworthy that this is the first example section in which the first sentence is actually a sentence and not what seems like an example title. I think this is the sort of thing that all the example sections should begin with. The sentence should end with a period, though (or a colon if it's reworded to describe "this example" instead of "the example below").

done. slightly reworded the sentence.

> >
> > The example foaf:mbox triples in this example data (and again in later examples throughout the document) should use mailto IRIs instead of literals.
> 

done.


> > The xsd:dateTime used in this example, as well as many others throughout the document use a timezone with an invalid lexical form with a one-digit timezone hour. Change "-2:00" to "-02:00".

done.

> >
> > "from the store's default graph." Should be be either "from the default graph" or "form the store's unnamed graph".

left as is, see comments before.

> > "The pattern in WHERE is matched against the graph store analogously to SPARQL 1.1 Query." Is it "analogous", or actually identical to the matching as per Query?

I simply removed "analogously to SPARQL 1.1 Query" it doesn't add, this is an informal descpription of the example only, anyways. 

> > "If the pattern matching fails, no changes occur." Would this really be considered 'failing'? Or would it simply be pattern matching yielding a solution sequence of length zero?
> 

removed that sentence.

> > "Data before". Again, I'm curious why there are 3 different styles for the book IRIs ('book3', 'book', and 'bookx').

done. I changed those to book1/book2/book3 now, hope all consistent.

> > "Example 2". Uses the sentence fragment example title style. Whichever predicate is used in the example in 3.1.3 (foaf:firstName, foaf:givenName) should be consistent with this example.

done.

> >
> > "A USING clause is present, meaning that the template also serves as the pattern to be matched in the http://example/addresses graph." I don't understand this sentence, particularly the use of "also". I would think that the presence of the USING clause would mean that the template/pattern *is to be matched* in the http://example/addresses graph.

done. (actually there was no USING clause there...) Changed to:
"
A <code>WITH</code> clause is present, meaning that graph <code>http://example/addresses</code> is both the one from which triples are being removed and the one which the <code>WHERE</code> clause is matched against.</p>
"

> >
> > 3.1.5 INSERT (Informative)
> > --------------------------
> >
> > "If no WHERE clause is present". I believe this should be "If no USING clause is present".
> 
done. rewritten.

> > "This example copies records from one named graph to another named graph based on a pattern." Should this talk about triples instead of "records"? Also, suggest the sentence end with a colon.

done.

> >
> > "Example 2". Uses the sentence fragment form. The description talks about both "records" and "objects". Can the language be tightened up to talk about triples (that are copied) that describe objects (as the query deals with things of type "dcmitype:PhysicalObject")?

done.

> > 3.1.6 DELETE WHERE
> > ------------------
> >
> > "Example 1" and "Example 2". Uses the sentence fragment example title style.
> >
> > "Find and remove statements naming something 'Fred' in the graph http://example.com/names, and also remove all statements about that resource from the graph http://example/addresses." I don't think this accurately describes what the example operation does. The QuadPattern, when used as a pattern for generating a solution sequence, will only match things named 'Fred' in the names graph that *also* appear as the subject of at least one triple in the 'addresses' graph. The description makes it sounds as if the data in the 'addresses' graph is an optional second step of the operation. If the <http://example.com/names> graph also contained this data:
> 
changed to (probably not ideal, but fixes the problem, I think):
"
<p><a name="example_j"/>This example request removes both statements naming some resource "Fred" in the graph <code>http://example.com/names</code>, and also statements about that resource from the graph <code>http://example/addresses</code> (assuming that any subject in the graph <code>http://example.com/names</code> has corresponding triples in the graph <code>http://example/addresses</code>).</p>
"

> > <http://example/fred2> a foaf:Person .
> > <http://example/fred2> foaf:firstName "Fred" .
> >
> > ... but no data about <http://example/fred2> was present in the <http://example.com/addresses> graph, then those two triples would still be in the <http://example.com/names> graph after the operation completed. Also, I'm not sure why the this triple:
> >
> > <http://example/fred> foaf:firstName "Fred" .
> >
> > ... remains in the 'names' graph after the operation. Shouldn't the DELETE operation have deleted it from the names graph along with the foaf:mbox triple from the 'addresses' graph?
> >
> > 3.1.7 LOAD
> > ----------
> >
> > "The LOAD operation copies all the triples from a remote graph into the specified graph." Is it really a remote graph that's being specified? Or rather a serialized graph (RDF document)? Perhaps it's the same thing, but I'd think of this operation as taking a document URI (which the EBNF actually calls it) and loading the triples you get when you parse the data you get when you dereference the URI.

done. reworded to
"The <code>LOAD</code> operation reads an RDF document from a IRI and inserts its triples into the specified graph in the graph store"

> >
> > "In case no RDF data can be retrieved (as opposed to the empty graph being retrieved) from documentIRI, the SPARQL 1.1 Update service is expected to return failure. In any other case, it will always return success." This says nothing about whether the retrieval was successful. What happens if you try to dereference the URI, and get back an HTTP error code along with an RDF payload? Shouldn't that raise an error?

We do not restrict to http here, as far as I understand. I tried to reword this
"In case no RDF data can be retrieved (as opposed to the empty graph being retrieved) from <code>documentIRI</code>, or in case the retrieval method returns an error (such as, for instance an HTTP error code), the SPARQL 1.1 Update service is expected to return failure."

I also removed the sentence 

"In any other case, it should always return success." 

as I didn't have the impression it added anything.

> >
> > 3.1.8 CLEAR
> > -----------
> >
> > This section is lacking an introductory sentence.
> >

done.
> > "The CLEAR operation removes all the triples in the specified graph." Consider s/graph/graph(s)/,

done.

> since two of the options for CLEAR may actually end up clearing multiple graphs.
> >
> > "has the same effect as". Is this normative? Can a graph store choose to remove the empty graph resulting from a CLEAR GRAPH operation, but keep it after a DELETE ... WHERE ... operation? If so, then these don't necessarily have the same effect.

Hmmm, good remark. For the moment, I just weakened "has the same effect as" to "in principle has the same effect as:"
Alternative would be to just remove this "syntactic sugar explanation", but I find it potentially useful for illustration.


> >
> > 3.2 Graph Management
> > --------------------
> >
> > "Graph management operations allow to create, destroy, move and copy named graphs". Suggest "allow creating, destroying, moving, and copying named graphs".

done.

> > The bullet list of graph management operations should be introduced in some way.
> 

done.

> > 3.2.1 CREATE
> > ------------
> >
> > This section is lacking an introductory sentence.
> >
> > "If the graph already exists, then a failure may be returned". Is this meant as a RFC2119-style "may"?
> 
I changed that to an RFC2119 style <strong>should</strong>... since the may seemed awkwardly weak... since you may always return an error on any request, or no?

> > 3.2.2 DROP
> > ------------
> >
> > This section is lacking an introductory sentence.
> >
> > ISSUE-59 has been resolved. I'm not sure if that means the comment seeking implementor feedback should be removed.

The resolution says that we should mark those features "AT RISK" asking for feedback. I think it's ok to leave the link to the ISSUE in. 

> >
> > This section should probably have a similar note as in CLEAR about the potentially far reaching implications of DROP DEFAULT on stores for which the default graph is
> the union of other graphs.

I left that out, there is an explicit reference to CLEAR DEFAULT made, where this is noted already. 

> >
> > 3.2.3 COPY
> > ----------
> >
> > "Data after". This example turtle is missing the @prefix declaration for foaf.

done.

> >
> > 3.2.4 MOVE
> > ----------
> >
> > "Data after". This example turtle is missing the @prefix declaration for foaf.

done.

> > 3.2.5 ADD
> > ---------
> >
> > "Example". Uses the sentence fragment example title style.

done.

> > 4 SPARQL Update Formal Model
> > ============================
> >
> > It seems a bit odd that section 4 doesn't have any introductory text.

Added a sentence.

> >
> > 4.1.1 Graph Store
> > -----------------
> >
> > s/i.e. i.e./i.e./
> >
> > This section uses both "default" and "unnamed" to talk about the non-named graph in a graph store. It hasn't been clear to me throughout the document if these are meant to be interchangeable terms.

meant to be interchangeable, IMO. At least, my understanding.

> >
> > Both "<=" and "≠" are used in this section (and in later sections). Suggest both be either ascii or unicode characters, but not a mix.

switched to unicode.

> > 4.1.2 Abstract Update Operation
> > -------------------------------
> >
> > This is obviously subjective, but I'd prefer the definition of the transformation in the other direction "GS' = Op(GSt, Args)" (GS' on the left hand side).
> 

I left that as is, since the following definitions have the operation on the LHS and then the definition of how GS' looks like.

> > I'm concerned that the text "can also alter the state of each graph individually" may be an issue for some people as I think it can be read as suggesting that the graphs are mutable (instead of the operation just replacing the graph with another one).

Changed to "It can also alter the state of each slot individually."

> >
> > 4.2.1 Dataset-UNION
> > -------------------
> >
> > Is the definition used here for graph union different from the traditional one used in RDF Semantics?

which one? this is set union, as opposed to rdf merge. I added (temporarily) a definition of Dataset-MERGE to make the difference clear. 
Can be removed, if not needed.

> Does there need to be any discussion of handling of blank nodes? (This also applies to graph minus in the next section.)

I think "where union between graphs is defined as set-union of triples in those graphs" is clear. If not, what 
is unclear here and where do blank nodes play a special role here?

(there are other places where we need to think about bnodes, more on that later (probably in a separate mail)

> > 4.2.2 Dataset-DIFF
> > ------------------
> >
> > There's a missing space between the first two sentences.

done. 

> >
> > 4.2.3 Dataset(QuadPattern, μ )
> > ------------------------------
> >
> > The previous two definitions didn't include the operation arguments in the section title.

This is because of the overloading. don't consider that critical.

> There's asymmetric whitespace in the argument list in the operation argument list in this section title.

fixed.


> > Since this section talks about syntx-level QuadPatterns, does there need to be discussion of what to do with blank nodes in the quad patterns (regarding disjointness of blank nodes across invocations of Dataset())?

*Open: yes assuming you talk about Section Dataset(QuadPattern, μ ) here.

> >
> > "combining these triples into a single named RDF graph by set union": I think "named" should be dropped here.

done. also reformulated that line slightly.

> >
> > "Dataset(QuadPattern, μ ) = Dataset-UNION ( Dataset(QuadPattern1, μ ) , Dataset(QuadPattern2, μ ) )": The definition of Dataset-UNION in section 4.2.1 is defined as taking one graph store and one dataset as arguments, but here is used with two datasets.

*Open: Hmmm, we also say that "i.e., the graph store can be viewed as a mutable RDF dataset" i.e. for the sake of these definitions, I don't really want to make a difference, to be honest, as I am afraid it would further complicate things. I was hoping this is clear enough. Do you consider this remark critical?


> >
> > 4.2.4 Dataset(QuadPattern, P, GS )
> > ----------------------------------
> >
> > "For instance, the following update...": No connection has been discussed yet between the Dataset() operation and the SPARQL Update syntax, so this example feels a bit strange.

*Open: I could add some more explanation on how this example results in the call of the Dataset() function, need to think about how to best do that.  

> > 4.3.1 Insert Data Operation
> > ---------------------------
> >
> > This section (and similarly in later sections) talks about "Insert Data Operations," but links to the section in the document that talks about the syntax and effects of "INSERT DATA" (without using the "Operation" terminology).

*Open: Yes, that connection from syntax to the operations still needs to be made, also remarked by Andy

> >
> > "new triples are added in the Graph Store, either in the default slot or in a named slot": I'm a bit concerned about the idea of triples being added "in a slot". The definition gets it right, but this description is hiding that the underlying mechanism is that the triples are combined with those in an existing graph, and the new graph replaces the old one in the appropriate slot.

Left that as is, don't see a problem or a way to improve that without making the formulation more complicated and not necessarily more readable.
 
> >
> > "where {} is the empty substitution": is it a "substitution," or a "solution mapping" (as the defintition of Dataset(QuadPattern, µ) says)?

done. replaced by "the empty solution mapping"

> > 4.3.2 Delete Data Operation
> > ---------------------------
> >
> > Similar concern about talking about removing triples "from a slot".

left as is, see above.

> > Similar concern about the "empty substitution".

done.

> > (Both of these apply similarly to the other operations in setion 4.)
> >
> >
> > 4.3.3 Delete Insert Operation
> > -----------------------------
> >
> > "Triples are identified as they match a particular Group Graph Pattern P against GS." I'm not sure what this means. Also, there doesn't seem to be any connection between the syntax and the variables used in this section. I can infer what QuadPattern_INS, QuadPattern_DEL, and P are, but it would be much better if the connection with the syntax were explicit.

reworded.

> This applies to the rest of the operations following this one, as well.
> >
> > No mention is made of how USING clauses affect this operation. The operation is defined as matching against the graph store, not the dataset constructed with the USING clauses (or via the protocol, etc.). This is a major omission, and needs to be addressed.
> >

*Open: yes this is indeed a (known) major issue still open.


> > 4.3.4 Delete Operation
> > ----------------------
> >
> > If the definition of DELETE is merely informative (covered entirely by DELETE/INSERT), why does it need a definition in the formal model? If it doesn't (which I think is best), then the previous section needs to mention how DELETE/INSERT operations are handled when either the DELETE or INSERT parts are empty.

*Open: The behavior where the DELETE or INSERT parts are empty is implicit (by the definitions of Dataset()), but I think I finally agree that probably all the "Definitions" for Delete Operation/InsetrOperation and DeleteWehereOperation should be dropped and just what needs to be done is to define the mapping from syntax do those operations

> > 4.3.5 Insert Operation
> > ----------------------
> >
> > Same concern as for DELETE.
> >
> > 4.3.7 Load Operation
> > --------------------
> >
> > This section should use the "dereference" terminology used in AWWW (the service description document uses this terminology as well).
> >
> > "serializing it into (GRAPH iri) triples": I'm not sure what the parenthetical means.

*Open: tried to reword that, but not finished, would be grateful for concrete rewording suggestions regarding the terminology...
 
> >
> > This operation lacks a connection between syntax and operation arguments, but also any explicit discussion on the difference between the two operation forms (using an IRI or defaulting to the unnamed graph).

The latter should be ok now with the new definition.

> >
> > 4.3.8 Clear Operation
> > ---------------------
> >
> > "OpClear(GS, iri) = GS if iri not in graphnames(GS)": covering the case where iri isn't in the graph names seems odd, since in general the error cases discussed in section 3 aren't dealt with in the formal model.

*Open I kind of disagree: It is not clear to me from what I read in Section 3 so far that clearing a non-existing graph should yield an error, that's why I covered that case.
Should we mention it explicitly in Section 3?


> >
> > 4.4.1 Create Operation
> > ----------------------
> >
> > In the note about non graph-aware stores, why should create operations be understood as followed by a drop operation instead of simply a no-op?

reworded:
<note> Since (non graph-aware) graph stores may remove graphs that are left empty, for such graph stores any CreateOperation may be viewed as implicitly immediately followed by a <a href="#def_dropoperation">DropOperation</a> (see next subsection), or simply as an operation with no effect.</note>


> > 4.4.2 DropOperation
> > -------------------
> >
> > "which is equivalent to OpClear_def, since the default graph cannot be removed": This seems to contradict the language in section 3.2.2 which talks about restoring the default graph after it is dropped.

Section 3.2.2 says 
 "However, in case the DEFAULT graph of the Graph Store is dropped, implementations must restore it after it was removed, i.e. DROP DEFAULT is equivalent to CLEAR DEFAULT."
 and that is what should be reflected here.

I hope to clear this and also the next comment by rewording:

"(which is equivalent to OpClear<sub>def</sub>, since the default graph cannot be removed, but dropping it means only to clear it)"


> It also seems to conflict with text in the same sentence: "OpDrop_all for dropping all graphs including the default graph".
> >
Received on Tuesday, 26 April 2011 03:13:45 UTC