- From: Kjetil Kjernsmo <kjetil@kjernsmo.net>
- Date: Tue, 28 Jan 2020 00:51:26 +0100
- To: public-sparql-12@w3.org
Hi Andy (and all)! Many thanks for the response! On lørdag 18. januar 2020 18:26:59 CET Andy Seaborne wrote: > On 16/01/2020 12:31, Kjetil Kjernsmo wrote: > > 1) A semaphore mechanism for updates. > > Observation: > > If there is a semaphore being provided by atomically setting server > state (triples in graph or something else), then it is Dekker > semaphores/spinlocks. > > I do wonder whether complex algorithms are a good idea. We can design > complex, correct algorithms but that doesn't mean they are practical. > They can be hard to get right and where they effect the way clients > interact can have malicious effects. Yes, indeed! I believe it is important to have relatively simple and practical algorithms in this field, that does not blow up in a possibly messy Web. > > These mechanism require all the clients to "play nice" and especially to > clean up properly. Adding timeouts is obviously necessary for semaphore > integrity but when breaking a lock, presumably you want to reverse > changes in progress. If it's several change steps for one UX edit, then > all the steps need undoing else exposing half a set of changes makes > implementing clients very hard and accident prone. Right, so it wasn't the intention to design an algorithm that requires locks across requests, to the contrary, the idea to avoid cross-request locks, at the cost of that a client may never be able to write. I should be careful not to put words in Tim's mouth, because we have not discussed in detail, so the following is largely my understanding. In Dekker's terms, one client signals that it wants to enter by issuing a DELETE DATA, or DELETE ... WHERE. It is then that client's turn if there exists exactly one triple that matches (the triple or triple pattern respectively). If it that client's turn, it is allowed to enter the critical section, if not, it will simply be rejected and the delete is rolled back. Thus, there isn't really a wait in Dekker's terms, AFAIU. The rejected client will not know when it can enter, and it is likely it must GET the resource again, before it can indicate that it wants to enter. Tough luck, but those are the breaks ;-) The main way clients may not "play nice" in this scheme is, I suppose, by entering complex or large queries, so that each individual request takes long. The server will need to protect against that, the server must make sure that each update is small compared to the expected workload, so that clients aren't rejected. Other than that, it is the server's responsibility to roll back and to reject clients, so in what other conditions are clients required to play nice? > 1/ > I understood that 409 happened when a WHERE matching returns zero or > more then one result. > https://github.com/solid/solid-spec/pull/193/files Yeah, it does in that example, but... > How does it happen in this example? it happens if the triple is present in the graph. So, I chose to use DELETE DATA because it is simpler, and because it serves to illustrate the point with the data leak (since a WHERE clause clearly requires Read anyway) > <digression> > It says it is a wilful violation but it isn't, strictly, a violation. It > may be surprising (it is!). HTTP does not have a way to require certain > behavior like 200 so the SPARQL spec can't either. OK, I didn't quite parse that sentence, but the fact that we require a success/fail status from the query itself, doesn't that violate the spec? > By the way, what happens if that semaphore 409 happens part way through > the request? Is the request atomic and the whole thing bounces, no > changes? Yes, absolutely. Within one request, this is a reasonable expectation, I think. > 2/ > DELETE DATA can have two uses. > > "remove a triple (assumed to be present)" > "ensure a triple is not in the data" > > Just looking at the requests, a system can't tell which is intended but > in the first there is the 409 case and in the second it's fine. First, the semaphore mechanism is needed only on updates, i.e. a DELETE followed by an INSERT. The first case can participate in such an operation, the latter would not. But for the sake of the argument, it is also why I chose to rely on this projection mechanism and the conditional request header, since it gives the client an opportunity to say it. > If you want write-only with no information leakage, I think that, except > for specialized (data dependent) situations -- > partitions/non-overlapping subgraphs -- it'll have be no information in > the response. Yes, indeed. > > I think W-access will imply fairly broad R-access. Some situations, > like partitioning into non-overlapping subgraphs, look possible but as a > general mechanism, if the request has a response, it can reveal > information. A response is a "read" (that said, for general SPARQL > Update, a bad actor can arrange to update the graph and use that as a > response channel with the 409). I have advocated that any query with a WHERE clause should require Read, that should cover it, right? I could imagine a class of queries where variables from one graph that you don't have access to could participate in the query, but not be projected, but in that case, I think we should have another access mode. > There are a couple of things that came up in that issue: is the count is > actual changes or the count of triples touched especially the WHERE case. > > INSERT DATA { :s :p :o } ; INSERT DATA { :s :p :o } > > Is that 0,1 (or in your non-atomic world, 2) > or always 2? There are uses case for all those cases - different uses > cases. So, in my case, only an EBV is important, so I can dodge that question :-) > and? > > INSERT { :s :p :o } WHERE { :s :p ?x } > > In some implementations, testing whether add/delete makes an actual > change to the graph is costly > > c.f. LSM trees (RocksDB, LevelDB, ...). Adding the change to a log to be > applied, and some in-memory view maintained, is less costly than > checking several places in the data, let alone the case when a > compaction is in progress. > In the EBV case, can we simplify the requirement to accommodate for that? > > Then, what should we do on the protocol level to support our semaphore? > > > > We should introduce another Conditional Request header, nominally "If- > > Variable" into HTTP. This is orthogonal to SPARQL, but the idea is that > > it > > names a variable, and if the Effective Boolean Value of that variable is > > false, the request will fail atomically with a 412 Precondition Failed. > > Or put an IF (ASK) in the front of the update request. At least then it > is all in the request body. Actually, I toyed with the idea that we could introduce some bashisms, DELETE DATA { <foo> <baz> "Dahut" } & INSERT DATA { <foo> <baz> "Foobar" } which would mean only execute the second query if the first is successful... But then, I found that projection and protocol level mechanisms were more interesting. Cheers, Kjetil
Received on Monday, 27 January 2020 23:51:55 UTC