Re: SPARQL 1.1 Update

From: Paul Gearon <gearon@ieee.org>
Date: Tue, 12 Jan 2010 11:54:01 -0500
Message-ID: <a25ac1f1001120854nfbc3f41h680581f8f56ee4ad@mail.gmail.com>
To: public-rdf-dawg-comments@w3.org
Richard Newmann raised some points that I'd like to see addressed, so
I thought I'd ask about them directly. I think I also need some
feedback from others before I can adequately form a response.

Starting with the first issue....

On Fri, Jan 8, 2010 at 7:32 PM, Richard Newman <rnewman@twinql.com> wrote:
> Hi folks,
> A few questions/comments on the Update portion of the 1.1 draft:
> * DELETE/MODIFY/INSERT are described in terms of CONSTRUCT templates.
> CONSTRUCT templates allow blank nodes, which are generated as fresh blank
> nodes for each input row. This makes sense for INSERT, but it doesn't make
> sense for DELETE: the fresh blank node will never match a triple in the
> store, than thus
> DELETE { ?s ?p [] } WHERE { ?s ?p ?o }
> is a no-op by definition. It would be good for this issue to be addressed in
> the spec, with one of the following possible resolutions:
> 1. Forbid blank nodes in a DELETE template.
> 2. Define those blank nodes as being null placeholders, such that
>   DELETE { ?s _:x _:y } WHERE { ?s rdf:type rdfs:Class }
>   would delete every triple whose subject is an rdfs:Class.
> 3. Document that DELETE triple patterns containing blank nodes will never
> match.
> * INSERT et al permit multiple "INTO" URIs:
> INSERT [ INTO <uri> ]* { template } [ WHERE { pattern } ]
> but the text discusses the graph in the singular ("The graph URI, if
> present, must be a valid named graph..."). Is it intended that '*' actually
> be '?'?
> If not, the text should be changed, and text added to describe how an
> implementation should process multiple graphs: e.g., should they run DELETE
> then INSERT on each graph in turn, or should all DELETEs be batched together
> prior to the INSERTs?

>From memory, we are not allowing blank nodes. Is that right?

I'm fine with this, if that's what's happening, but from a theoretical
viewpoint I believe that his second option is better (blank nodes can
match anything). I don't like the third option at all.

Either way, I agree that it should be mentioned in the document.

> * Re atomicity: it would seem that, for systems which will allow multiple
> SPARQL/Update requests within a single transaction, the requirement that
> "Each request should be treated atomically by a SPARQL-Update service" is
> onerous. I don't know of too many systems that support sub-transactions, and
> thus implementations will be forced to take one of two routes:
> 1. Violating the spec: "sorry pal, that doesn't apply: our transactions
> have multi-request scope"
> 2. Annoying users: "sorry pal, we aborted your transaction because SPARQL
> 1.1 says we have to, even though you wanted to handle it yourself".
> Neither choice is beneficial to users (the former because it reduces their
> ability to rely on the spec). I'd suggest changing the language to require
> that implementations provide "some method of atomically executing the entire
> contents of a SPARQL/Update request", which allows for the execution of a
> request within an existing transaction, as well as for approaches that
> execute requests within their own new transaction.

I pushed for this, since I think it deals with some (though definitely
not all) of the transaction issues. I intentionally said "should" to
avoid making it compulsory (should the word be capitalized as
"SHOULD"?) though I'd like to see it in systems that are capable of

Should the word be changed to "MAY"? Are his concerns justified and it
should it be dropped altogether? This has been talked about before,
but I believe that the discussion has been limited.

> * There doesn't seem to be any mention at all of responses in the draft. Is
> that intentional?

I believe so. That's a job for the protocol, right?

> * Re LOAD: if we can have CREATE SILENT, how about LOAD SILENT, which
> doesn't fail (and abort your transaction!) if the LOAD fails?

There's no transactions, but if multiple operations are to be
completed atomically, then his point is made.

LOAD can fail in one of the following ways:
1. The graph does not exist, and we do not allow graphs to be
automatically created when data is inserted into them. (See ISSUE-20).
2. The document to be loaded is malformed.
3. The document to be loaded cannot be read.
4. There is an error updating the graph with the contents of the document.

#4 is an internal system error, and not our problem. #3 is an error
that is also our of our hands (non-existent file, no permissions, i/o
error, etc). #2 is also an error (should we permit partial inserts for
documents that are well formed up to that point, or recoverable

#1 is the only one that might not be considered an "error". If we
create graphs automatically, then it's not an issue. If we don't, then
inserting into a non-existent graph would be an "error", but one that
can be avoided with a "CREATE SILENT" guard. In this case I think we
can just consider the error condition here, rather than allowing LOAD

As for all of these possible error conditions, this brings me back to
the point that errors are really part of the protocol. Correct?

> * I'd like to throw my 2 in for Issue 20.
> It strikes me as a little short-sighted to assume that every store operates
> with first-class graph objects, such that they can be created and deleted in
> a closed-world fashion: not only does this conflict with some
> implementations (e.g., those which use quad stores to efficiently implement
> named graphs, and those which dynamically load data from a graph on an ad
> hoc basis), but it also is dissonant with the "triple stores are caches of
> the semantic web" open-world view.

I don't follow his reasoning here. Can someone shed some light on it for me?

I see no conflict between quad stores and graphs being created and
deleted. Mulgara was one of the earliest quad stores, and it has
always had operations for creating and deleting a graph.

I *think* I see what he's talking about with the open-world view, in
that any URI should be treated as a possible graph (just one that we
may not know the contents of). However, from an implementation
perspective, this gets tricky, since so many stores implement the
common extension of de-referencing graph URIs that the store does not
hold locally. Without the ability to CREATE a graph locally, then it
won't be possible to know if an INSERT or LOAD into a URI should
create a local graph, or attempt to do an HTTP PUT/POST operation
(assuming URIs in the HTTP scheme).

Can someone help me out on this please? Even if it's just a response
to his concern, I don't know what other people think on this issue.

> I see in emails text like "We have agreed on the need to support a graph
> that exists and is empty"[1]. I would like to see strong supporting evidence
> for this in the spec (or some other persistent and accessible place) before
> resolving this issue. I personally don't see any need to distinguish an
> empty graph (after all, it's easy to add an all-bnodes triple to it to make
> it non-empty but without excess meaning).

I'm not sure if he's asking for evidence of the need or of us agreeing
on the need.

> I note that there is no proposal for CREATE SUBJECT (or PREDICATE or
> OBJECT), nor CREATE LANGTAG. I see little point in unnecessarily
> special-casing one value space to reduce its dynamism.

SPARQL has always treated graphs differently to subjects, predicate
and objects. I believe that this is necessary as some implementations
do not support named graphs. Also, RDF itself clearly defines the
elements of a triple, while treating the definition of a graph
somewhat separately. Is this correct?

> From interactions with users, I expect that "oh, you mean I have to CREATE a
> graph before I can use it in an INSERT query?" will be a common question,
> and "always preface your query with CREATE SILENT..." the pervasive
> response. Seems like a waste of time to me.
> (Regardless of the official outcome of the issue, my implementation is
> unlikely to strictly follow the CREATE/DROP behavior, because it would be
> inefficient to track graphs for the sole purpose of throwing errors in edge
> cases. CREATE will be a no-op, and DROP will be identical to CLEAR.)

Well, Mulgara already tracks it, and we've never considered it a
problem (indeed, it's quite beneficial in many ways), so I certainly
have a bias on this question.

Paul Gearon
