SPARQL 1.1 Update from Richard Newman on 2010-01-09 (public-rdf-dawg-comments@w3.org from January 2010)

From: Richard Newman <rnewman@twinql.com>
Date: Fri, 8 Jan 2010 16:32:52 -0800
To: public-rdf-dawg-comments@w3.org
Message-Id: <4DE4ACE6-BA40-4180-95F2-CFE6EBAB7175@twinql.com>
Hi folks,

A few questions/comments on the Update portion of the 1.1 draft:

* DELETE/MODIFY/INSERT are described in terms of CONSTRUCT templates.  
CONSTRUCT templates allow blank nodes, which are generated as fresh  
blank nodes for each input row. This makes sense for INSERT, but it  
doesn't make sense for DELETE: the fresh blank node will never match a  
triple in the store, than thus

   DELETE { ?s ?p [] } WHERE { ?s ?p ?o }

is a no-op by definition. It would be good for this issue to be  
addressed in the spec, with one of the following possible resolutions:

   1. Forbid blank nodes in a DELETE template.

   2. Define those blank nodes as being null placeholders, such that

       DELETE { ?s _:x _:y } WHERE { ?s rdf:type rdfs:Class }

      would delete every triple whose subject is an rdfs:Class.

   3. Document that DELETE triple patterns containing blank nodes will  
never match.

* INSERT et al permit multiple "INTO" URIs:

   INSERT [ INTO <uri> ]* { template } [ WHERE { pattern } ]

but the text discusses the graph in the singular ("The graph URI, if  
present, must be a valid named graph..."). Is it intended that '*'  
actually be '?'?

If not, the text should be changed, and text added to describe how an  
implementation should process multiple graphs: e.g., should they run  
DELETE then INSERT on each graph in turn, or should all DELETEs be  
batched together prior to the INSERTs?

* Re atomicity: it would seem that, for systems which will allow  
multiple SPARQL/Update requests within a single transaction, the  
requirement that "Each request should be treated atomically by a  
SPARQL-Update service" is onerous. I don't know of too many systems  
that support sub-transactions, and thus implementations will be forced  
to take one of two routes:

   1. Violating the spec: "sorry pal, that doesn't apply: our  
transactions have multi-request scope"
   2. Annoying users: "sorry pal, we aborted your transaction because  
SPARQL 1.1 says we have to, even though you wanted to handle it  
yourself".

Neither choice is beneficial to users (the former because it reduces  
their ability to rely on the spec). I'd suggest changing the language  
to require that implementations provide "some method of atomically  
executing the entire contents of a SPARQL/Update request", which  
allows for the execution of a request within an existing transaction,  
as well as for approaches that execute requests within their own new  
transaction.

* There doesn't seem to be any mention at all of responses in the  
draft. Is that intentional?

* Re LOAD: if we can have CREATE SILENT, how about LOAD SILENT, which  
doesn't fail (and abort your transaction!) if the LOAD fails?

* I'd like to throw my 2¢ in for Issue 20.

It strikes me as a little short-sighted to assume that every store  
operates with first-class graph objects, such that they can be created  
and deleted in a closed-world fashion: not only does this conflict  
with some implementations (e.g., those which use quad stores to  
efficiently implement named graphs, and those which dynamically load  
data from a graph on an ad hoc basis), but it also is dissonant with  
the "triple stores are caches of the semantic web" open-world view.

I see in emails text like "We have agreed on the need to support a  
graph that exists and is empty"[1]. I would like to see strong  
supporting evidence for this in the spec (or some other persistent and  
accessible place) before resolving this issue. I personally don't see  
any need to distinguish an empty graph (after all, it's easy to add an  
all-bnodes triple to it to make it non-empty but without excess  
meaning).

I note that there is no proposal for CREATE SUBJECT (or PREDICATE or  
OBJECT), nor CREATE LANGTAG. I see little point in unnecessarily  
special-casing one value space to reduce its dynamism.

 From interactions with users, I expect that "oh, you mean I have to  
CREATE a graph before I can use it in an INSERT query?" will be a  
common question, and "always preface your query with CREATE SILENT..."  
the pervasive response. Seems like a waste of time to me.

(Regardless of the official outcome of the issue, my implementation is  
unlikely to strictly follow the CREATE/DROP behavior, because it would  
be inefficient to track graphs for the sole purpose of throwing errors  
in edge cases. CREATE will be a no-op, and DROP will be identical to  
CLEAR.)

Thanks for your time.

-Richard Newman

[1] <http://lists.w3.org/Archives/Public/public-rdf-dawg/2010JanMar/0070.html 
 >
Received on Saturday, 9 January 2010 00:33:22 UTC