- From: Andy Seaborne <andy.seaborne@talis.com>
- Date: Thu, 14 Jan 2010 14:50:27 +0000
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
- CC: rnewman@twinql.com
Moved to WG list, cc'ed to Richard. My comments inline. Andy On 12/01/2010 4:54 PM, Paul Gearon wrote: > Richard Newmann raised some points that I'd like to see addressed, so > I thought I'd ask about them directly. I think I also need some > feedback from others before I can adequately form a response. > > Starting with the first issue.... > > On Fri, Jan 8, 2010 at 7:32 PM, Richard Newman<rnewman@twinql.com> wrote: >> Hi folks, >> >> A few questions/comments on the Update portion of the 1.1 draft: >> >> * DELETE/MODIFY/INSERT are described in terms of CONSTRUCT templates. >> CONSTRUCT templates allow blank nodes, which are generated as fresh blank >> nodes for each input row. This makes sense for INSERT, but it doesn't make >> sense for DELETE: the fresh blank node will never match a triple in the >> store, than thus >> >> DELETE { ?s ?p [] } WHERE { ?s ?p ?o } >> >> is a no-op by definition. It would be good for this issue to be addressed in >> the spec, with one of the following possible resolutions: >> >> 1. Forbid blank nodes in a DELETE template. >> >> 2. Define those blank nodes as being null placeholders, such that >> >> DELETE { ?s _:x _:y } WHERE { ?s rdf:type rdfs:Class } >> >> would delete every triple whose subject is an rdfs:Class. >> >> 3. Document that DELETE triple patterns containing blank nodes will never >> match. >> >> * INSERT et al permit multiple "INTO" URIs: >> >> INSERT [ INTO<uri> ]* { template } [ WHERE { pattern } ] >> >> but the text discusses the graph in the singular ("The graph URI, if >> present, must be a valid named graph..."). Is it intended that '*' actually >> be '?'? >> >> If not, the text should be changed, and text added to describe how an >> implementation should process multiple graphs: e.g., should they run DELETE >> then INSERT on each graph in turn, or should all DELETEs be batched together >> prior to the INSERTs? > >> From memory, we are not allowing blank nodes. Is that right? As far as I know, we are. > > I'm fine with this, if that's what's happening, but from a theoretical > viewpoint I believe that his second option is better (blank nodes can > match anything). I don't like the third option at all. > > Either way, I agree that it should be mentioned in the document. 1 is possible but we end up with several variations on "template", from triples only in DATA forms, triples + named variables (here) and for INSERT triples + variables + bnodes. For 2 - treating a DELETE template as still being a pattern (so not like CONSTRUCT nor INSERT), Treating bnodes as ANY and unbound variables as don't match (c.f. CONSTRUCT templates) is inconsistent to me. We need a consistent treatment. We do have the DELETE shortform. If full-DELETE is still a template, we don't need a short form because it is DELETE { template } WHERE {} (the empty pattern). If you prefer a fewer operations, you may like that approach. For named variables: Do we want to have partially restricted templates or say do it as a proper full DELETE { template } WHERE {...} because it is only adding the template into the WHERE. Does not address bNodes directly but let's make a consistent decision. ---- I mildly favour 3. This is (1) without the enforcement. Parsers may choose to emit a warning (caveat: where does the warning go to on the web?) >> * Re atomicity: it would seem that, for systems which will allow multiple >> SPARQL/Update requests within a single transaction, the requirement that >> "Each request should be treated atomically by a SPARQL-Update service" is >> onerous. I don't know of too many systems that support sub-transactions, and >> thus implementations will be forced to take one of two routes: >> >> 1. Violating the spec: "sorry pal, that doesn't apply: our transactions >> have multi-request scope" >> 2. Annoying users: "sorry pal, we aborted your transaction because SPARQL >> 1.1 says we have to, even though you wanted to handle it yourself". >> >> Neither choice is beneficial to users (the former because it reduces their >> ability to rely on the spec). I'd suggest changing the language to require >> that implementations provide "some method of atomically executing the entire >> contents of a SPARQL/Update request", which allows for the execution of a >> request within an existing transaction, as well as for approaches that >> execute requests within their own new transaction. > > I pushed for this, since I think it deals with some (though definitely > not all) of the transaction issues. I intentionally said "should" to > avoid making it compulsory (should the word be capitalized as > "SHOULD"?) though I'd like to see it in systems that are capable of > it. Some terminology confusion perhaps. A "request" is several "operations" and one request is one HTTP POST. Need a terminology section - this is still outstanding from my WD comments. When the text says "should", I think it is talking about route 1 already. So, yes, let's give that full RFC 2119 force of SHOULD. > Should the word be changed to "MAY"? Are his concerns justified and it > should it be dropped altogether? This has been talked about before, > but I believe that the discussion has been limited. > >> * There doesn't seem to be any mention at all of responses in the draft. Is >> that intentional? > > I believe so. That's a job for the protocol, right? Yes but it is worth noting that no operations have any results other than success/failure (unlike a query, say). > >> * Re LOAD: if we can have CREATE SILENT, how about LOAD SILENT, which >> doesn't fail (and abort your transaction!) if the LOAD fails? > > There's no transactions, but if multiple operations are to be > completed atomically, then his point is made. > > LOAD can fail in one of the following ways: > 1. The graph does not exist, and we do not allow graphs to be > automatically created when data is inserted into them. (See ISSUE-20). > 2. The document to be loaded is malformed. > 3. The document to be loaded cannot be read. > 4. There is an error updating the graph with the contents of the document. 5. related to 2/3 - get a partial read so some triples come it. > > #4 is an internal system error, and not our problem. #3 is an error > that is also our of our hands (non-existent file, no permissions, i/o > error, etc). #2 is also an error (should we permit partial inserts for > documents that are well formed up to that point, or recoverable > errors?). > > #1 is the only one that might not be considered an "error". If we > create graphs automatically, then it's not an issue. If we don't, then > inserting into a non-existent graph would be an "error", but one that > can be avoided with a "CREATE SILENT" guard. In this case I think we > can just consider the error condition here, rather than allowing LOAD > SILENT. We don't actually say what happens for a LOAD. The ability to load a remote graph as best one can (connection drops, document is found to broken part way through) is useful expecially at scale, as is the otherway round. Andy > As for all of these possible error conditions, this brings me back to > the point that errors are really part of the protocol. Correct? > >> * I'd like to throw my 2¢ in for Issue 20. >> >> It strikes me as a little short-sighted to assume that every store operates >> with first-class graph objects, such that they can be created and deleted in >> a closed-world fashion: not only does this conflict with some >> implementations (e.g., those which use quad stores to efficiently implement >> named graphs, and those which dynamically load data from a graph on an ad >> hoc basis), but it also is dissonant with the "triple stores are caches of >> the semantic web" open-world view. > > I don't follow his reasoning here. Can someone shed some light on it for me? > > I see no conflict between quad stores and graphs being created and > deleted. Mulgara was one of the earliest quad stores, and it has > always had operations for creating and deleting a graph. > > I *think* I see what he's talking about with the open-world view, in > that any URI should be treated as a possible graph (just one that we > may not know the contents of). However, from an implementation > perspective, this gets tricky, since so many stores implement the > common extension of de-referencing graph URIs that the store does not > hold locally. Without the ability to CREATE a graph locally, then it > won't be possible to know if an INSERT or LOAD into a URI should > create a local graph, or attempt to do an HTTP PUT/POST operation > (assuming URIs in the HTTP scheme). > > Can someone help me out on this please? Even if it's just a response > to his concern, I don't know what other people think on this issue. > >> I see in emails text like "We have agreed on the need to support a graph >> that exists and is empty"[1]. I would like to see strong supporting evidence >> for this in the spec (or some other persistent and accessible place) before >> resolving this issue. I personally don't see any need to distinguish an >> empty graph (after all, it's easy to add an all-bnodes triple to it to make >> it non-empty but without excess meaning). > > I'm not sure if he's asking for evidence of the need or of us agreeing > on the need. > >> I note that there is no proposal for CREATE SUBJECT (or PREDICATE or >> OBJECT), nor CREATE LANGTAG. I see little point in unnecessarily >> special-casing one value space to reduce its dynamism. > > SPARQL has always treated graphs differently to subjects, predicate > and objects. I believe that this is necessary as some implementations > do not support named graphs. Also, RDF itself clearly defines the > elements of a triple, while treating the definition of a graph > somewhat separately. Is this correct? > >> From interactions with users, I expect that "oh, you mean I have to CREATE a >> graph before I can use it in an INSERT query?" will be a common question, >> and "always preface your query with CREATE SILENT..." the pervasive >> response. Seems like a waste of time to me. >> >> (Regardless of the official outcome of the issue, my implementation is >> unlikely to strictly follow the CREATE/DROP behavior, because it would be >> inefficient to track graphs for the sole purpose of throwing errors in edge >> cases. CREATE will be a no-op, and DROP will be identical to CLEAR.) > > Well, Mulgara already tracks it, and we've never considered it a > problem (indeed, it's quite beneficial in many ways), so I certainly > have a bias on this question. > > Regards, > Paul Gearon > > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________
Received on Thursday, 14 January 2010 14:50:53 UTC