Re: SPARQL 1.1 Update from Andy Seaborne on 2010-01-14 (public-rdf-dawg@w3.org from January to March 2010)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Thu, 14 Jan 2010 14:50:27 +0000
To: SPARQL Working Group <public-rdf-dawg@w3.org>
CC: rnewman@twinql.com
Message-ID: <4B4F2F33.1040902@talis.com>
Moved to WG list, cc'ed to Richard.  My comments inline.

	Andy

On 12/01/2010 4:54 PM, Paul Gearon wrote:
> Richard Newmann raised some points that I'd like to see addressed, so
> I thought I'd ask about them directly. I think I also need some
> feedback from others before I can adequately form a response.
>
> Starting with the first issue....
>
> On Fri, Jan 8, 2010 at 7:32 PM, Richard Newman<rnewman@twinql.com>  wrote:
>> Hi folks,
>>
>> A few questions/comments on the Update portion of the 1.1 draft:
>>
>> * DELETE/MODIFY/INSERT are described in terms of CONSTRUCT templates.
>> CONSTRUCT templates allow blank nodes, which are generated as fresh blank
>> nodes for each input row. This makes sense for INSERT, but it doesn't make
>> sense for DELETE: the fresh blank node will never match a triple in the
>> store, than thus
>>
>>   DELETE { ?s ?p [] } WHERE { ?s ?p ?o }
>>
>> is a no-op by definition. It would be good for this issue to be addressed in
>> the spec, with one of the following possible resolutions:
>>
>>   1. Forbid blank nodes in a DELETE template.
>>
>>   2. Define those blank nodes as being null placeholders, such that
>>
>>       DELETE { ?s _:x _:y } WHERE { ?s rdf:type rdfs:Class }
>>
>>      would delete every triple whose subject is an rdfs:Class.
>>
>>   3. Document that DELETE triple patterns containing blank nodes will never
>> match.
>>
>> * INSERT et al permit multiple "INTO" URIs:
>>
>>   INSERT [ INTO<uri>  ]* { template } [ WHERE { pattern } ]
>>
>> but the text discusses the graph in the singular ("The graph URI, if
>> present, must be a valid named graph..."). Is it intended that '*' actually
>> be '?'?
>>
>> If not, the text should be changed, and text added to describe how an
>> implementation should process multiple graphs: e.g., should they run DELETE
>> then INSERT on each graph in turn, or should all DELETEs be batched together
>> prior to the INSERTs?
>
>> From memory, we are not allowing blank nodes. Is that right?

As far as I know, we are.

>
> I'm fine with this, if that's what's happening, but from a theoretical
> viewpoint I believe that his second option is better (blank nodes can
> match anything). I don't like the third option at all.
>
> Either way, I agree that it should be mentioned in the document.

1 is possible but we end up with several variations on "template", from 
triples only in DATA forms, triples + named variables (here) and for 
INSERT triples + variables + bnodes.

For 2 - treating a DELETE template as still being a pattern  (so not 
like CONSTRUCT nor INSERT), Treating bnodes as ANY and unbound variables 
as don't match (c.f. CONSTRUCT templates) is inconsistent to me.  We 
need a consistent treatment.

We do have the DELETE shortform.

If full-DELETE is still a template, we don't need a short form because 
it is DELETE { template } WHERE {} (the empty pattern).  If you prefer a 
fewer operations, you may like that approach.

For named variables:
Do we want to have partially restricted templates or say do it as a 
proper full DELETE { template } WHERE {...} because it is only adding 
the template into the WHERE.

Does not address bNodes directly but let's make a consistent decision.

----

I mildly favour 3.  This is (1) without the enforcement.  Parsers may 
choose to emit a warning (caveat: where does the warning go to on the web?)

>> * Re atomicity: it would seem that, for systems which will allow multiple
>> SPARQL/Update requests within a single transaction, the requirement that
>> "Each request should be treated atomically by a SPARQL-Update service" is
>> onerous. I don't know of too many systems that support sub-transactions, and
>> thus implementations will be forced to take one of two routes:
>>
>>   1. Violating the spec: "sorry pal, that doesn't apply: our transactions
>> have multi-request scope"
>>   2. Annoying users: "sorry pal, we aborted your transaction because SPARQL
>> 1.1 says we have to, even though you wanted to handle it yourself".
>>
>> Neither choice is beneficial to users (the former because it reduces their
>> ability to rely on the spec). I'd suggest changing the language to require
>> that implementations provide "some method of atomically executing the entire
>> contents of a SPARQL/Update request", which allows for the execution of a
>> request within an existing transaction, as well as for approaches that
>> execute requests within their own new transaction.
>
> I pushed for this, since I think it deals with some (though definitely
> not all) of the transaction issues. I intentionally said "should" to
> avoid making it compulsory (should the word be capitalized as
> "SHOULD"?) though I'd like to see it in systems that are capable of
> it.

Some terminology confusion perhaps.  A "request" is several "operations" 
and one request is one HTTP POST.  Need a terminology section - this is 
still outstanding from my WD comments.

When the text says "should", I think it is talking about route 1 already.

So, yes, let's give that full RFC 2119 force of SHOULD.

> Should the word be changed to "MAY"? Are his concerns justified and it
> should it be dropped altogether? This has been talked about before,
> but I believe that the discussion has been limited.
>
>> * There doesn't seem to be any mention at all of responses in the draft. Is
>> that intentional?
>
> I believe so. That's a job for the protocol, right?

Yes but it is worth noting that no operations have any results other 
than success/failure (unlike a query, say).

>
>> * Re LOAD: if we can have CREATE SILENT, how about LOAD SILENT, which
>> doesn't fail (and abort your transaction!) if the LOAD fails?
>
> There's no transactions, but if multiple operations are to be
> completed atomically, then his point is made.
>
> LOAD can fail in one of the following ways:
> 1. The graph does not exist, and we do not allow graphs to be
> automatically created when data is inserted into them. (See ISSUE-20).
> 2. The document to be loaded is malformed.
> 3. The document to be loaded cannot be read.
> 4. There is an error updating the graph with the contents of the document.
5. related to 2/3 - get a partial read so some triples come it.

>
> #4 is an internal system error, and not our problem. #3 is an error
> that is also our of our hands (non-existent file, no permissions, i/o
> error, etc). #2 is also an error (should we permit partial inserts for
> documents that are well formed up to that point, or recoverable
> errors?).
>
> #1 is the only one that might not be considered an "error". If we
> create graphs automatically, then it's not an issue. If we don't, then
> inserting into a non-existent graph would be an "error", but one that
> can be avoided with a "CREATE SILENT" guard. In this case I think we
> can just consider the error condition here, rather than allowing LOAD
> SILENT.

We don't actually say what happens for a LOAD.

The ability to load a remote graph as best one can (connection drops, 
document is found to broken part way through) is useful expecially at 
scale, as is the otherway round.

	Andy

> As for all of these possible error conditions, this brings me back to
> the point that errors are really part of the protocol. Correct?
>
>> * I'd like to throw my 2¢ in for Issue 20.
>>
>> It strikes me as a little short-sighted to assume that every store operates
>> with first-class graph objects, such that they can be created and deleted in
>> a closed-world fashion: not only does this conflict with some
>> implementations (e.g., those which use quad stores to efficiently implement
>> named graphs, and those which dynamically load data from a graph on an ad
>> hoc basis), but it also is dissonant with the "triple stores are caches of
>> the semantic web" open-world view.
>
> I don't follow his reasoning here. Can someone shed some light on it for me?
>
> I see no conflict between quad stores and graphs being created and
> deleted. Mulgara was one of the earliest quad stores, and it has
> always had operations for creating and deleting a graph.
>
> I *think* I see what he's talking about with the open-world view, in
> that any URI should be treated as a possible graph (just one that we
> may not know the contents of). However, from an implementation
> perspective, this gets tricky, since so many stores implement the
> common extension of de-referencing graph URIs that the store does not
> hold locally. Without the ability to CREATE a graph locally, then it
> won't be possible to know if an INSERT or LOAD into a URI should
> create a local graph, or attempt to do an HTTP PUT/POST operation
> (assuming URIs in the HTTP scheme).
>
> Can someone help me out on this please? Even if it's just a response
> to his concern, I don't know what other people think on this issue.
 >
>> I see in emails text like "We have agreed on the need to support a graph
>> that exists and is empty"[1]. I would like to see strong supporting evidence
>> for this in the spec (or some other persistent and accessible place) before
>> resolving this issue. I personally don't see any need to distinguish an
>> empty graph (after all, it's easy to add an all-bnodes triple to it to make
>> it non-empty but without excess meaning).
>
> I'm not sure if he's asking for evidence of the need or of us agreeing
> on the need.
>
>> I note that there is no proposal for CREATE SUBJECT (or PREDICATE or
>> OBJECT), nor CREATE LANGTAG. I see little point in unnecessarily
>> special-casing one value space to reduce its dynamism.
>
> SPARQL has always treated graphs differently to subjects, predicate
> and objects. I believe that this is necessary as some implementations
> do not support named graphs. Also, RDF itself clearly defines the
> elements of a triple, while treating the definition of a graph
> somewhat separately. Is this correct?
>
>>  From interactions with users, I expect that "oh, you mean I have to CREATE a
>> graph before I can use it in an INSERT query?" will be a common question,
>> and "always preface your query with CREATE SILENT..." the pervasive
>> response. Seems like a waste of time to me.
>>
>> (Regardless of the official outcome of the issue, my implementation is
>> unlikely to strictly follow the CREATE/DROP behavior, because it would be
>> inefficient to track graphs for the sole purpose of throwing errors in edge
>> cases. CREATE will be a no-op, and DROP will be identical to CLEAR.)
>
> Well, Mulgara already tracks it, and we've never considered it a
> problem (indeed, it's quite beneficial in many ways), so I certainly
> have a bias on this question.
>
> Regards,
> Paul Gearon
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
Received on Thursday, 14 January 2010 14:50:53 UTC