Re: the state of ldp-patch, and a procedural proposal

Hi guys,

I've been playing with different approaches around LDP PATCH. I wanted
to provide some feedback, both from the user's perspective and from
the implementer's perspective.

As a user, I'm among the people interested in implementing a
Decentralized Social Web using a vanilla/generic LDP server, with
WebID and WebACL. I've mainly focused my own experiments with being
able to patch WebID profiles. So far in the read-write-web server,
we've been using full SPARQL but we are interested in a lighter PATCH
format which would not rely on external libraries.

On 09/14/2013 09:40 PM, Sandro Hawke wrote:
> There have been some good emails on public-ldp-patch, and there was some
> good discussion at F2F4.   Here's where I think we are.   I don't know
> of anything in this email that anyone would disagree with (that is, I'm
> trying to summarize consensus), and I end with a suggested path forward.
>
> I think the biggest challenge we face -- and the challenge that divided
> me and Eric at the meeting -- is how to patch triples that involve blank
> nodes.   There seem to be two approaches:
>
> 1.  Require the client to create a graph pattern (a "where clause")
> which unambiguously identifies the blank nodes involved in the triples
> to be updated, and require the server to use that graph pattern to find
> those blank nodes in the graph being patched.
>
> 2.  Require that during the conversation that ends up involving
> patching, both parties use the same mapping from blank node labels to
> blank nodes.
>
> Option 1 is a good fit for SPARQL.   SPARQL servers naturally do that
> graph matching.  In contract, standard SPARQL servers don't have any way
> to share blank node scope as required for option 2. That kind of
> exposure of blank node labels has traditionally been avoided in the
> design of RDF systems.

General remark: Linked Data (in LDP) is different from general RDF:
the data lives in "small" HTTP documents, not in "big" RDF store. I
believe that the problem that SPARQL Update addresses is quite
different from what we want to achieve with LDP PATCH. Because of
that, I was against considering a subset of SPARQL Update at first,
but Eric and my experiments made me change my mind.

>
> However, the worst-case performance with option 1 is exponential. If a
> triple to be updated is in the middle of a large cloud of blank nodes,
> then matching the where-clause might not be possible before we all die
> of old age.  (It's an extremely well studied problem in computer
> science; I'm not an expert, but I think I'm reading the results correctly.)

Following Eric's lead, I've actually started with the BGP approach:
DELETE + INSERT + WHERE-with-simple-BGP. To address the complexity
issue, I think we can always add some restriction on the BGP. In
practice, it depends on the expressive power the people are expecting
for a PATCH.

>
> No one has offered data about how often this worst-case behavior might
> be a problem in practice.  Arguably we're still in the early days, so
> it's too soon to know how painful this restriction might turn out to be.
>
> Some people said that the server can just set a time limit and reject
> patches that end up taking too long.   Other people (me) replied that
> makes the overall system too unpredictable, that systems should be able
> to send patches with confidence, especially one server to another.  As I
> said at the meeting, I don't know if this worst-case performance will
> turn out to be a problem, but I'm concerned enough about it that I can't
> +1 option 1, and don't want my name on a spec based on it.  David
> reported at the meeting that Google's internal culture generally forbids
> using exponential algorithms, so we might expect if they were in the
> group they would formally object to option 1 (or just decide to never
> use it, which amounts to the same thing).  Our anecdotal reports that
> they don't use SPARQL support this hearsay, but as long is it remains
> hearsay, we probably shouldn't take it too seriously.
>
> Which brings me to the proposal.
>
> Let's move forward with both Option 1 *and* Option 2, marking them both
> "at risk" in the spec.   That gives us the whole Last Call and Candidate
> Recommendation periods to gather input on how bad the exponential
> performance issue is for Option 1 and how bad the implementation
> challenge is for Option 2 (how hard it is to get RDF systems to share
> scope in blank node labels).
>
> Then at the end of CR, we can decide if either of them is good enough to
> normatively reference as the basic LDP patch format.   If they both end
> up implemented and with people liking them, then we just pick one, so
> the folks don't have to implement both going forward.    If neither of
> them is implemented and liked, then we're back to where we are today,
> with no standard patch format for LDP, but some more data on why it's hard.
>
> How's that sound?
>
> I imagine Option 1 would end up as some subset of SPARQL Update, like
> TurtlePatch  [1] plus variables or like Eric presented at the meeting. I
> imagine for Option 2 we'd have something like Andy and Rob's RDFPatch
> [2] or my old GRUF [3] (which I'd forgotten about until reading RDFPatch).

Here is my reviews on [1] and [2] and some other proposals. Take into
account that I didn't participate in the conversations on LDP PATCH
and that there is to knowledge no page gathering all the proposals.

TurtlePatch
-----------

Champion: Sandro

Summary: subset of SPARQL Update with INSERT and DELETE clauses.

Example:

[[
  PREFIX foaf <http://xmlns.com/foaf/0.1/>
  PREFIX s <http://www.w3.org/2000/01/rdf-schema#>
  DELETE DATA {
    <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox 
<mailto:timbl@w3.org>
  }
  INSERT DATA {
    <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox 
<mailto:timbl@hushmail.com>
    <http://www.w3.org/People/Berners-Lee/card> s:comment "This is my 
general description of myself.\n\nI try to keep data here up to date and 
it should be considered authoritative."
  }
]]

Pros:
* can be implemented using full SPARQL implementation
* easy to implement from scratch (parser + runtime)

Cons:
* no support for bnodes

Status:
* I implemented this approach in Banana-RDF

Remark: Sandro talked about "TurtlePatch plus variables" but I'm not
sure what that means exactly by reading his spec. Until I see a
solution properly considering bnodes, it will be a -1 for me.

RDF Patch
---------

Champion: Andy Seaborne

Summary: diffs for RDF dataset

Example: (A is for Add and D for Delete)

[[
  A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Robert" .
  A <http://example.org/bob> <http://xmlns.com/foaf/0.1/knows> 
<http://example/alice> .
  A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Alice" .
  D <http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Robert" .
  A <http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Bob" .
]]

Pros:
* easy to implement from scratch (parser + runtime)

Cons:
* specified for an RDF dataset, not an LDPR
* blank nodes are system dependant, so not well specified in the case
   of LDP

Remark: actually pretty much the same than TurtlePatch, but it feels
like it was written for a different use-case. I don't understand how
LDP is supposed to communicate stable bnodes label for that solution
to work. So -1 again for me.

EricP's proposal (sorry, don't have a better name)
----------------

Champion: EricP

Summary: SPARQL subset with DELETE, INSERT and WHERE clause. The WHERE
clause is restricted to a simple BGP with no var-predicates.

Example:
[[
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
DELETE { ?s foaf:name "Alex" }
INSERT { ?s foaf:name "Alexandre" }
WHERE { ?s foaf:name "Alex" }
]]

Pros:
* can be implemented using full SPARQL implementation
* in practice, still easy enough to implement from scratch (parser + 
runtime)
* can be used to reach some bnodes

Cons:
* some queries can be NP-complete (would be good to document one
   example)

Status:
* I implemented this approach in Banana-RDF

Remark: one could argue that in practice, LDPRs are not supposed to be
crazy big, and that most queries won't end up with the worst-case
complexity. But I do share the concerns. So it's a +1 for me if the
group wants to go with it, with some reservations about the
complexity.

EricP's proposal + pinned nodes
-------------------------------

Champion: TimBL

*note: TimBL used another term that "pinned node" but I cannot
  remember it right now :-/ Please someone (Tim?) help me.

Summary: same as previous one, but the BGP returns a single matching
node. Some additional constraints are put on the query. TimBL wrote
the algorithm on the whiteboard for me and it made sense.

Pros:
* can be implemented using full SPARQL implementation
* in practice, still easy enough to implement from scratch (parser + 
runtime)
* can be used to reach some bnodes
* expected not to be NP-complete

Cons:
* we further restrain the number of bnodes that can be
   matched. Shouldn't be an issue in practice.

Status:
* I implemented a similar restriction in Banana-RDF when the BGP is a
   tree pattern.
* TimBL told me that Tabulator already implements that approach.

Remark: it's a refinement of EricP's proposal. The specifics still
have to be worked on but I like the general idea about constraining
the BGP. Another +1 for me, but I obviously prefer this one on
EricP's.

Joe Presbrey's PATCH
--------------------

Champion: Joe Presbrey

Summary: format is Turtle. For each triple { s p o }, { s p ANY } is
deleted and { s p o } is added. His implementation forbids the use of
blank nodes.

Pros:
* super easy

Cons:
* can't have { s p o1; o2 } anymore
* no bnodes

Status:
* Joe implemented this approach in one of his projects (I guess either
   data.fm or ldpy)

Remarks: it's too destructive and doesn't handle bnodes, so -1.



In summary, it looks like people are ok to consider a subset of
SPARQL. Also, I believe that we cannot ignore the bnodes out there and
LDP PATCH must provide an acceptable solution for them. I personally
found EricP's proposal easy to implement, so that's a clear
candidate. Like others, I'm sensitive to the complexity issue and I
believe that some additional constraint on the BGP should avoid the
pitfall, so I'm interested in TimBL's idea from Tabulator.

At that point, I believe that the most compelling proposal is EricP's
with TimBL's constraint. If the group shows interest, I'd be
interested in writing a first draft of the spec.

What do you guys think?

Alexandre.

>
>      -- Sandro
>
> [1]  http://www.w3.org/2001/sw/wiki/TurtlePatch
> [2]  http://afs.github.io/rdf-patch
> [3]  http://websub.org/wiki/GRUF (from Apr 2010)
>
>
>
>

Received on Friday, 18 October 2013 02:57:13 UTC