Re: the state of ldp-patch, and a procedural proposal from Pierre-Antoine Champin on 2013-10-18 (public-ldp-wg@w3.org from October 2013)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Fri, 18 Oct 2013 14:13:21 +0200
To: Alexandre Bertails <bertails@w3.org>
Cc: "public-ldp-patch@w3.org" <public-ldp-patch@w3.org>, Linked Data Platform WG <public-ldp-wg@w3.org>
Message-ID: <CA+OuRR-UzhgL0qtEo0gMwv61iv1yAX2aKLDusL2J0nwNXMV1jw@mail.gmail.com>
Alexandre, all,

first thanks for this nice review.

It seems that my own proposal [1] has gone completely unnoticed.
I'm not taking this personnally, but I would like to hear the feedback of
the group on this idea, especially the handling of RDF lists, which I think
are not covered by any of the proposals mentionned by Alexandre.

  pa

[1] http://lists.w3.org/Archives/Public/public-ldp-patch/2013Sep/0022.html


On Fri, Oct 18, 2013 at 4:57 AM, Alexandre Bertails <bertails@w3.org> wrote:

> Hi guys,
>
> I've been playing with different approaches around LDP PATCH. I wanted
> to provide some feedback, both from the user's perspective and from
> the implementer's perspective.
>
> As a user, I'm among the people interested in implementing a
> Decentralized Social Web using a vanilla/generic LDP server, with
> WebID and WebACL. I've mainly focused my own experiments with being
> able to patch WebID profiles. So far in the read-write-web server,
> we've been using full SPARQL but we are interested in a lighter PATCH
> format which would not rely on external libraries.
>
> On 09/14/2013 09:40 PM, Sandro Hawke wrote:
> > There have been some good emails on public-ldp-patch, and there was some
> > good discussion at F2F4.   Here's where I think we are.   I don't know
> > of anything in this email that anyone would disagree with (that is, I'm
> > trying to summarize consensus), and I end with a suggested path forward.
> >
> > I think the biggest challenge we face -- and the challenge that divided
> > me and Eric at the meeting -- is how to patch triples that involve blank
> > nodes.   There seem to be two approaches:
> >
> > 1.  Require the client to create a graph pattern (a "where clause")
> > which unambiguously identifies the blank nodes involved in the triples
> > to be updated, and require the server to use that graph pattern to find
> > those blank nodes in the graph being patched.
> >
> > 2.  Require that during the conversation that ends up involving
> > patching, both parties use the same mapping from blank node labels to
> > blank nodes.
> >
> > Option 1 is a good fit for SPARQL.   SPARQL servers naturally do that
> > graph matching.  In contract, standard SPARQL servers don't have any way
> > to share blank node scope as required for option 2. That kind of
> > exposure of blank node labels has traditionally been avoided in the
> > design of RDF systems.
>
> General remark: Linked Data (in LDP) is different from general RDF:
> the data lives in "small" HTTP documents, not in "big" RDF store. I
> believe that the problem that SPARQL Update addresses is quite
> different from what we want to achieve with LDP PATCH. Because of
> that, I was against considering a subset of SPARQL Update at first,
> but Eric and my experiments made me change my mind.
>
> >
> > However, the worst-case performance with option 1 is exponential. If a
> > triple to be updated is in the middle of a large cloud of blank nodes,
> > then matching the where-clause might not be possible before we all die
> > of old age.  (It's an extremely well studied problem in computer
> > science; I'm not an expert, but I think I'm reading the results
> correctly.)
>
> Following Eric's lead, I've actually started with the BGP approach:
> DELETE + INSERT + WHERE-with-simple-BGP. To address the complexity
> issue, I think we can always add some restriction on the BGP. In
> practice, it depends on the expressive power the people are expecting
> for a PATCH.
>
> >
> > No one has offered data about how often this worst-case behavior might
> > be a problem in practice.  Arguably we're still in the early days, so
> > it's too soon to know how painful this restriction might turn out to be.
> >
> > Some people said that the server can just set a time limit and reject
> > patches that end up taking too long.   Other people (me) replied that
> > makes the overall system too unpredictable, that systems should be able
> > to send patches with confidence, especially one server to another.  As I
> > said at the meeting, I don't know if this worst-case performance will
> > turn out to be a problem, but I'm concerned enough about it that I can't
> > +1 option 1, and don't want my name on a spec based on it.  David
> > reported at the meeting that Google's internal culture generally forbids
> > using exponential algorithms, so we might expect if they were in the
> > group they would formally object to option 1 (or just decide to never
> > use it, which amounts to the same thing).  Our anecdotal reports that
> > they don't use SPARQL support this hearsay, but as long is it remains
> > hearsay, we probably shouldn't take it too seriously.
> >
> > Which brings me to the proposal.
> >
> > Let's move forward with both Option 1 *and* Option 2, marking them both
> > "at risk" in the spec.   That gives us the whole Last Call and Candidate
> > Recommendation periods to gather input on how bad the exponential
> > performance issue is for Option 1 and how bad the implementation
> > challenge is for Option 2 (how hard it is to get RDF systems to share
> > scope in blank node labels).
> >
> > Then at the end of CR, we can decide if either of them is good enough to
> > normatively reference as the basic LDP patch format.   If they both end
> > up implemented and with people liking them, then we just pick one, so
> > the folks don't have to implement both going forward.    If neither of
> > them is implemented and liked, then we're back to where we are today,
> > with no standard patch format for LDP, but some more data on why it's
> hard.
> >
> > How's that sound?
> >
> > I imagine Option 1 would end up as some subset of SPARQL Update, like
> > TurtlePatch  [1] plus variables or like Eric presented at the meeting. I
> > imagine for Option 2 we'd have something like Andy and Rob's RDFPatch
> > [2] or my old GRUF [3] (which I'd forgotten about until reading
> RDFPatch).
>
> Here is my reviews on [1] and [2] and some other proposals. Take into
> account that I didn't participate in the conversations on LDP PATCH
> and that there is to knowledge no page gathering all the proposals.
>
> TurtlePatch
> -----------
>
> Champion: Sandro
>
> Summary: subset of SPARQL Update with INSERT and DELETE clauses.
>
> Example:
>
> [[
>   PREFIX foaf <http://xmlns.com/foaf/0.1/>
>   PREFIX s <http://www.w3.org/2000/01/rdf-schema#>
>   DELETE DATA {
>     <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox
> <mailto:timbl@w3.org>
>   }
>   INSERT DATA {
>     <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox
> <mailto:timbl@hushmail.com>
>     <http://www.w3.org/People/Berners-Lee/card> s:comment "This is my
> general description of myself.\n\nI try to keep data here up to date and
> it should be considered authoritative."
>   }
> ]]
>
> Pros:
> * can be implemented using full SPARQL implementation
> * easy to implement from scratch (parser + runtime)
>
> Cons:
> * no support for bnodes
>
> Status:
> * I implemented this approach in Banana-RDF
>
> Remark: Sandro talked about "TurtlePatch plus variables" but I'm not
> sure what that means exactly by reading his spec. Until I see a
> solution properly considering bnodes, it will be a -1 for me.
>
> RDF Patch
> ---------
>
> Champion: Andy Seaborne
>
> Summary: diffs for RDF dataset
>
> Example: (A is for Add and D for Delete)
>
> [[
>   A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Robert" .
>   A <http://example.org/bob> <http://xmlns.com/foaf/0.1/knows>
> <http://example/alice> .
>   A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Alice" .
>   D <http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Robert" .
>   A <http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Bob" .
> ]]
>
> Pros:
> * easy to implement from scratch (parser + runtime)
>
> Cons:
> * specified for an RDF dataset, not an LDPR
> * blank nodes are system dependant, so not well specified in the case
>    of LDP
>
> Remark: actually pretty much the same than TurtlePatch, but it feels
> like it was written for a different use-case. I don't understand how
> LDP is supposed to communicate stable bnodes label for that solution
> to work. So -1 again for me.
>
> EricP's proposal (sorry, don't have a better name)
> ----------------
>
> Champion: EricP
>
> Summary: SPARQL subset with DELETE, INSERT and WHERE clause. The WHERE
> clause is restricted to a simple BGP with no var-predicates.
>
> Example:
> [[
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> DELETE { ?s foaf:name "Alex" }
> INSERT { ?s foaf:name "Alexandre" }
> WHERE { ?s foaf:name "Alex" }
> ]]
>
> Pros:
> * can be implemented using full SPARQL implementation
> * in practice, still easy enough to implement from scratch (parser +
> runtime)
> * can be used to reach some bnodes
>
> Cons:
> * some queries can be NP-complete (would be good to document one
>    example)
>
> Status:
> * I implemented this approach in Banana-RDF
>
> Remark: one could argue that in practice, LDPRs are not supposed to be
> crazy big, and that most queries won't end up with the worst-case
> complexity. But I do share the concerns. So it's a +1 for me if the
> group wants to go with it, with some reservations about the
> complexity.
>
> EricP's proposal + pinned nodes
> -------------------------------
>
> Champion: TimBL
>
> *note: TimBL used another term that "pinned node" but I cannot
>   remember it right now :-/ Please someone (Tim?) help me.
>
> Summary: same as previous one, but the BGP returns a single matching
> node. Some additional constraints are put on the query. TimBL wrote
> the algorithm on the whiteboard for me and it made sense.
>
> Pros:
> * can be implemented using full SPARQL implementation
> * in practice, still easy enough to implement from scratch (parser +
> runtime)
> * can be used to reach some bnodes
> * expected not to be NP-complete
>
> Cons:
> * we further restrain the number of bnodes that can be
>    matched. Shouldn't be an issue in practice.
>
> Status:
> * I implemented a similar restriction in Banana-RDF when the BGP is a
>    tree pattern.
> * TimBL told me that Tabulator already implements that approach.
>
> Remark: it's a refinement of EricP's proposal. The specifics still
> have to be worked on but I like the general idea about constraining
> the BGP. Another +1 for me, but I obviously prefer this one on
> EricP's.
>
> Joe Presbrey's PATCH
> --------------------
>
> Champion: Joe Presbrey
>
> Summary: format is Turtle. For each triple { s p o }, { s p ANY } is
> deleted and { s p o } is added. His implementation forbids the use of
> blank nodes.
>
> Pros:
> * super easy
>
> Cons:
> * can't have { s p o1; o2 } anymore
> * no bnodes
>
> Status:
> * Joe implemented this approach in one of his projects (I guess either
>    data.fm or ldpy)
>
> Remarks: it's too destructive and doesn't handle bnodes, so -1.
>
>
>
> In summary, it looks like people are ok to consider a subset of
> SPARQL. Also, I believe that we cannot ignore the bnodes out there and
> LDP PATCH must provide an acceptable solution for them. I personally
> found EricP's proposal easy to implement, so that's a clear
> candidate. Like others, I'm sensitive to the complexity issue and I
> believe that some additional constraint on the BGP should avoid the
> pitfall, so I'm interested in TimBL's idea from Tabulator.
>
> At that point, I believe that the most compelling proposal is EricP's
> with TimBL's constraint. If the group shows interest, I'd be
> interested in writing a first draft of the spec.
>
> What do you guys think?
>
> Alexandre.
>
> >
> >      -- Sandro
> >
> > [1]  http://www.w3.org/2001/sw/wiki/TurtlePatch
> > [2]  http://afs.github.io/rdf-patch
> > [3]  http://websub.org/wiki/GRUF (from Apr 2010)
> >
> >
> >
> >
>
>
>
Received on Friday, 18 October 2013 12:13:50 UTC