Re: the state of ldp-patch, and a procedural proposal from Alexandre Bertails on 2013-10-18 (public-ldp-wg@w3.org from October 2013)

From: Alexandre Bertails <bertails@w3.org>
Date: Fri, 18 Oct 2013 10:05:06 -0400
To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
CC: "public-ldp-patch@w3.org" <public-ldp-patch@w3.org>, Linked Data Platform WG <public-ldp-wg@w3.org>
Message-ID: <52614012.40505@w3.org>
On 10/18/2013 08:13 AM, Pierre-Antoine Champin wrote:
> Alexandre, all,
>
> first thanks for this nice review.
>
> It seems that my own proposal [1] has gone completely unnoticed.
> I'm not taking this personnally, but I would like to hear the feedback
> of the group on this idea, especially the handling of RDF lists, which I
> think are not covered by any of the proposals mentionned by Alexandre.
>

Effectively, nothing personal, but my apologies for missing yours. I've 
created a wiki page with my summary and have already added yours [2].

Generally, I like this approach. I do like the handling of rdf:list
(despite what the SPARQL crowd says, rdf:list are awesome!) but I
still need to think more about it. It is not a subset of SPARQL Update
but having property paths is already a good way to access some bnodes
without referring to them. I would prefer to avoid relying on
skolemization. The B (Bind) idea looks simple. I believe this can
easily be combined with the simple BGP + nailed nodes approach.

I see emerging several trends:

* 2 kind of syntaxes: SPARQL Update subset or RDF PATCH derivatives

* skolemizing bnodes VS matching bnodes

* expressive power, ie. how powerful and complex the BGP can be

* support of specific features, eg. rdf:list

Maybe we can have polls to make progress? I believe that the very
first question to answer is about skolemization.

Alexandre.

[2] 
http://www.w3.org/2012/ldp/wiki/LDP_PATCH_Proposals#Pierre-Antoine_Champin.27s_RDF-PATCH



>    pa
>
> [1] http://lists.w3.org/Archives/Public/public-ldp-patch/2013Sep/0022.html
>
>
> On Fri, Oct 18, 2013 at 4:57 AM, Alexandre Bertails <bertails@w3.org
> <mailto:bertails@w3.org>> wrote:
>
>     Hi guys,
>
>     I've been playing with different approaches around LDP PATCH. I wanted
>     to provide some feedback, both from the user's perspective and from
>     the implementer's perspective.
>
>     As a user, I'm among the people interested in implementing a
>     Decentralized Social Web using a vanilla/generic LDP server, with
>     WebID and WebACL. I've mainly focused my own experiments with being
>     able to patch WebID profiles. So far in the read-write-web server,
>     we've been using full SPARQL but we are interested in a lighter PATCH
>     format which would not rely on external libraries.
>
>     On 09/14/2013 09:40 PM, Sandro Hawke wrote:
>      > There have been some good emails on public-ldp-patch, and there
>     was some
>      > good discussion at F2F4.   Here's where I think we are.   I don't
>     know
>      > of anything in this email that anyone would disagree with (that
>     is, I'm
>      > trying to summarize consensus), and I end with a suggested path
>     forward.
>      >
>      > I think the biggest challenge we face -- and the challenge that
>     divided
>      > me and Eric at the meeting -- is how to patch triples that
>     involve blank
>      > nodes.   There seem to be two approaches:
>      >
>      > 1.  Require the client to create a graph pattern (a "where clause")
>      > which unambiguously identifies the blank nodes involved in the
>     triples
>      > to be updated, and require the server to use that graph pattern
>     to find
>      > those blank nodes in the graph being patched.
>      >
>      > 2.  Require that during the conversation that ends up involving
>      > patching, both parties use the same mapping from blank node labels to
>      > blank nodes.
>      >
>      > Option 1 is a good fit for SPARQL.   SPARQL servers naturally do that
>      > graph matching.  In contract, standard SPARQL servers don't have
>     any way
>      > to share blank node scope as required for option 2. That kind of
>      > exposure of blank node labels has traditionally been avoided in the
>      > design of RDF systems.
>
>     General remark: Linked Data (in LDP) is different from general RDF:
>     the data lives in "small" HTTP documents, not in "big" RDF store. I
>     believe that the problem that SPARQL Update addresses is quite
>     different from what we want to achieve with LDP PATCH. Because of
>     that, I was against considering a subset of SPARQL Update at first,
>     but Eric and my experiments made me change my mind.
>
>      >
>      > However, the worst-case performance with option 1 is exponential.
>     If a
>      > triple to be updated is in the middle of a large cloud of blank
>     nodes,
>      > then matching the where-clause might not be possible before we
>     all die
>      > of old age.  (It's an extremely well studied problem in computer
>      > science; I'm not an expert, but I think I'm reading the results
>     correctly.)
>
>     Following Eric's lead, I've actually started with the BGP approach:
>     DELETE + INSERT + WHERE-with-simple-BGP. To address the complexity
>     issue, I think we can always add some restriction on the BGP. In
>     practice, it depends on the expressive power the people are expecting
>     for a PATCH.
>
>      >
>      > No one has offered data about how often this worst-case behavior
>     might
>      > be a problem in practice.  Arguably we're still in the early days, so
>      > it's too soon to know how painful this restriction might turn out
>     to be.
>      >
>      > Some people said that the server can just set a time limit and reject
>      > patches that end up taking too long.   Other people (me) replied that
>      > makes the overall system too unpredictable, that systems should
>     be able
>      > to send patches with confidence, especially one server to
>     another.  As I
>      > said at the meeting, I don't know if this worst-case performance will
>      > turn out to be a problem, but I'm concerned enough about it that
>     I can't
>      > +1 option 1, and don't want my name on a spec based on it.  David
>      > reported at the meeting that Google's internal culture generally
>     forbids
>      > using exponential algorithms, so we might expect if they were in the
>      > group they would formally object to option 1 (or just decide to never
>      > use it, which amounts to the same thing).  Our anecdotal reports that
>      > they don't use SPARQL support this hearsay, but as long is it remains
>      > hearsay, we probably shouldn't take it too seriously.
>      >
>      > Which brings me to the proposal.
>      >
>      > Let's move forward with both Option 1 *and* Option 2, marking
>     them both
>      > "at risk" in the spec.   That gives us the whole Last Call and
>     Candidate
>      > Recommendation periods to gather input on how bad the exponential
>      > performance issue is for Option 1 and how bad the implementation
>      > challenge is for Option 2 (how hard it is to get RDF systems to share
>      > scope in blank node labels).
>      >
>      > Then at the end of CR, we can decide if either of them is good
>     enough to
>      > normatively reference as the basic LDP patch format.   If they
>     both end
>      > up implemented and with people liking them, then we just pick one, so
>      > the folks don't have to implement both going forward.    If
>     neither of
>      > them is implemented and liked, then we're back to where we are today,
>      > with no standard patch format for LDP, but some more data on why
>     it's hard.
>      >
>      > How's that sound?
>      >
>      > I imagine Option 1 would end up as some subset of SPARQL Update, like
>      > TurtlePatch  [1] plus variables or like Eric presented at the
>     meeting. I
>      > imagine for Option 2 we'd have something like Andy and Rob's RDFPatch
>      > [2] or my old GRUF [3] (which I'd forgotten about until reading
>     RDFPatch).
>
>     Here is my reviews on [1] and [2] and some other proposals. Take into
>     account that I didn't participate in the conversations on LDP PATCH
>     and that there is to knowledge no page gathering all the proposals.
>
>     TurtlePatch
>     -----------
>
>     Champion: Sandro
>
>     Summary: subset of SPARQL Update with INSERT and DELETE clauses.
>
>     Example:
>
>     [[
>        PREFIX foaf <http://xmlns.com/foaf/0.1/>
>        PREFIX s <http://www.w3.org/2000/01/rdf-schema#>
>        DELETE DATA {
>          <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox
>     <mailto:timbl@w3.org <mailto:timbl@w3.org>>
>        }
>        INSERT DATA {
>          <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox
>     <mailto:timbl@hushmail.com <mailto:timbl@hushmail.com>>
>          <http://www.w3.org/People/Berners-Lee/card> s:comment "This is my
>     general description of myself.\n\nI try to keep data here up to date and
>     it should be considered authoritative."
>        }
>     ]]
>
>     Pros:
>     * can be implemented using full SPARQL implementation
>     * easy to implement from scratch (parser + runtime)
>
>     Cons:
>     * no support for bnodes
>
>     Status:
>     * I implemented this approach in Banana-RDF
>
>     Remark: Sandro talked about "TurtlePatch plus variables" but I'm not
>     sure what that means exactly by reading his spec. Until I see a
>     solution properly considering bnodes, it will be a -1 for me.
>
>     RDF Patch
>     ---------
>
>     Champion: Andy Seaborne
>
>     Summary: diffs for RDF dataset
>
>     Example: (A is for Add and D for Delete)
>
>     [[
>        A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name>
>     "Robert" .
>        A <http://example.org/bob> <http://xmlns.com/foaf/0.1/knows>
>     <http://example/alice> .
>        A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name>
>     "Alice" .
>        D <http://example.org/bob> <http://xmlns.com/foaf/0.1/name>
>     "Robert" .
>        A <http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Bob" .
>     ]]
>
>     Pros:
>     * easy to implement from scratch (parser + runtime)
>
>     Cons:
>     * specified for an RDF dataset, not an LDPR
>     * blank nodes are system dependant, so not well specified in the case
>         of LDP
>
>     Remark: actually pretty much the same than TurtlePatch, but it feels
>     like it was written for a different use-case. I don't understand how
>     LDP is supposed to communicate stable bnodes label for that solution
>     to work. So -1 again for me.
>
>     EricP's proposal (sorry, don't have a better name)
>     ----------------
>
>     Champion: EricP
>
>     Summary: SPARQL subset with DELETE, INSERT and WHERE clause. The WHERE
>     clause is restricted to a simple BGP with no var-predicates.
>
>     Example:
>     [[
>     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>     DELETE { ?s foaf:name "Alex" }
>     INSERT { ?s foaf:name "Alexandre" }
>     WHERE { ?s foaf:name "Alex" }
>     ]]
>
>     Pros:
>     * can be implemented using full SPARQL implementation
>     * in practice, still easy enough to implement from scratch (parser +
>     runtime)
>     * can be used to reach some bnodes
>
>     Cons:
>     * some queries can be NP-complete (would be good to document one
>         example)
>
>     Status:
>     * I implemented this approach in Banana-RDF
>
>     Remark: one could argue that in practice, LDPRs are not supposed to be
>     crazy big, and that most queries won't end up with the worst-case
>     complexity. But I do share the concerns. So it's a +1 for me if the
>     group wants to go with it, with some reservations about the
>     complexity.
>
>     EricP's proposal + pinned nodes
>     -------------------------------
>
>     Champion: TimBL
>
>     *note: TimBL used another term that "pinned node" but I cannot
>        remember it right now :-/ Please someone (Tim?) help me.
>
>     Summary: same as previous one, but the BGP returns a single matching
>     node. Some additional constraints are put on the query. TimBL wrote
>     the algorithm on the whiteboard for me and it made sense.
>
>     Pros:
>     * can be implemented using full SPARQL implementation
>     * in practice, still easy enough to implement from scratch (parser +
>     runtime)
>     * can be used to reach some bnodes
>     * expected not to be NP-complete
>
>     Cons:
>     * we further restrain the number of bnodes that can be
>         matched. Shouldn't be an issue in practice.
>
>     Status:
>     * I implemented a similar restriction in Banana-RDF when the BGP is a
>         tree pattern.
>     * TimBL told me that Tabulator already implements that approach.
>
>     Remark: it's a refinement of EricP's proposal. The specifics still
>     have to be worked on but I like the general idea about constraining
>     the BGP. Another +1 for me, but I obviously prefer this one on
>     EricP's.
>
>     Joe Presbrey's PATCH
>     --------------------
>
>     Champion: Joe Presbrey
>
>     Summary: format is Turtle. For each triple { s p o }, { s p ANY } is
>     deleted and { s p o } is added. His implementation forbids the use of
>     blank nodes.
>
>     Pros:
>     * super easy
>
>     Cons:
>     * can't have { s p o1; o2 } anymore
>     * no bnodes
>
>     Status:
>     * Joe implemented this approach in one of his projects (I guess either
>     data.fm <http://data.fm> or ldpy)
>
>     Remarks: it's too destructive and doesn't handle bnodes, so -1.
>
>
>
>     In summary, it looks like people are ok to consider a subset of
>     SPARQL. Also, I believe that we cannot ignore the bnodes out there and
>     LDP PATCH must provide an acceptable solution for them. I personally
>     found EricP's proposal easy to implement, so that's a clear
>     candidate. Like others, I'm sensitive to the complexity issue and I
>     believe that some additional constraint on the BGP should avoid the
>     pitfall, so I'm interested in TimBL's idea from Tabulator.
>
>     At that point, I believe that the most compelling proposal is EricP's
>     with TimBL's constraint. If the group shows interest, I'd be
>     interested in writing a first draft of the spec.
>
>     What do you guys think?
>
>     Alexandre.
>
>      >
>      >      -- Sandro
>      >
>      > [1] http://www.w3.org/2001/sw/wiki/TurtlePatch
>      > [2] http://afs.github.io/rdf-patch
>      > [3] http://websub.org/wiki/GRUF (from Apr 2010)
>      >
>      >
>      >
>      >
>
>
>
Received on Friday, 18 October 2013 14:05:10 UTC