the state of ldp-patch, and a procedural proposal

There have been some good emails on public-ldp-patch, and there was some 
good discussion at F2F4.   Here's where I think we are.   I don't know 
of anything in this email that anyone would disagree with (that is, I'm 
trying to summarize consensus), and I end with a suggested path forward.

I think the biggest challenge we face -- and the challenge that divided 
me and Eric at the meeting -- is how to patch triples that involve blank 
nodes.   There seem to be two approaches:

1.  Require the client to create a graph pattern (a "where clause") 
which unambiguously identifies the blank nodes involved in the triples 
to be updated, and require the server to use that graph pattern to find 
those blank nodes in the graph being patched.

2.  Require that during the conversation that ends up involving 
patching, both parties use the same mapping from blank node labels to 
blank nodes.

Option 1 is a good fit for SPARQL.   SPARQL servers naturally do that 
graph matching.  In contract, standard SPARQL servers don't have any way 
to share blank node scope as required for option 2. That kind of 
exposure of blank node labels has traditionally been avoided in the 
design of RDF systems.

However, the worst-case performance with option 1 is exponential. If a 
triple to be updated is in the middle of a large cloud of blank nodes, 
then matching the where-clause might not be possible before we all die 
of old age.  (It's an extremely well studied problem in computer 
science; I'm not an expert, but I think I'm reading the results correctly.)

No one has offered data about how often this worst-case behavior might 
be a problem in practice.  Arguably we're still in the early days, so 
it's too soon to know how painful this restriction might turn out to be.

Some people said that the server can just set a time limit and reject 
patches that end up taking too long.   Other people (me) replied that 
makes the overall system too unpredictable, that systems should be able 
to send patches with confidence, especially one server to another.  As I 
said at the meeting, I don't know if this worst-case performance will 
turn out to be a problem, but I'm concerned enough about it that I can't 
+1 option 1, and don't want my name on a spec based on it.  David 
reported at the meeting that Google's internal culture generally forbids 
using exponential algorithms, so we might expect if they were in the 
group they would formally object to option 1 (or just decide to never 
use it, which amounts to the same thing).  Our anecdotal reports that 
they don't use SPARQL support this hearsay, but as long is it remains 
hearsay, we probably shouldn't take it too seriously.

Which brings me to the proposal.

Let's move forward with both Option 1 *and* Option 2, marking them both 
"at risk" in the spec.   That gives us the whole Last Call and Candidate 
Recommendation periods to gather input on how bad the exponential 
performance issue is for Option 1 and how bad the implementation 
challenge is for Option 2 (how hard it is to get RDF systems to share 
scope in blank node labels).

Then at the end of CR, we can decide if either of them is good enough to 
normatively reference as the basic LDP patch format.   If they both end 
up implemented and with people liking them, then we just pick one, so 
the folks don't have to implement both going forward.    If neither of 
them is implemented and liked, then we're back to where we are today, 
with no standard patch format for LDP, but some more data on why it's hard.

How's that sound?

I imagine Option 1 would end up as some subset of SPARQL Update, like 
TurtlePatch  [1] plus variables or like Eric presented at the meeting.  
I imagine for Option 2 we'd have something like Andy and Rob's RDFPatch 
[2] or my old GRUF [3] (which I'd forgotten about until reading RDFPatch).

     -- Sandro

[1]  http://www.w3.org/2001/sw/wiki/TurtlePatch
[2]  http://afs.github.io/rdf-patch
[3]  http://websub.org/wiki/GRUF (from Apr 2010)

Received on Sunday, 15 September 2013 01:40:38 UTC