- From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
- Date: Fri, 18 Oct 2013 14:13:21 +0200
- To: Alexandre Bertails <bertails@w3.org>
- Cc: "public-ldp-patch@w3.org" <public-ldp-patch@w3.org>, Linked Data Platform WG <public-ldp-wg@w3.org>
- Message-ID: <CA+OuRR-UzhgL0qtEo0gMwv61iv1yAX2aKLDusL2J0nwNXMV1jw@mail.gmail.com>
Alexandre, all, first thanks for this nice review. It seems that my own proposal [1] has gone completely unnoticed. I'm not taking this personnally, but I would like to hear the feedback of the group on this idea, especially the handling of RDF lists, which I think are not covered by any of the proposals mentionned by Alexandre. pa [1] http://lists.w3.org/Archives/Public/public-ldp-patch/2013Sep/0022.html On Fri, Oct 18, 2013 at 4:57 AM, Alexandre Bertails <bertails@w3.org> wrote: > Hi guys, > > I've been playing with different approaches around LDP PATCH. I wanted > to provide some feedback, both from the user's perspective and from > the implementer's perspective. > > As a user, I'm among the people interested in implementing a > Decentralized Social Web using a vanilla/generic LDP server, with > WebID and WebACL. I've mainly focused my own experiments with being > able to patch WebID profiles. So far in the read-write-web server, > we've been using full SPARQL but we are interested in a lighter PATCH > format which would not rely on external libraries. > > On 09/14/2013 09:40 PM, Sandro Hawke wrote: > > There have been some good emails on public-ldp-patch, and there was some > > good discussion at F2F4. Here's where I think we are. I don't know > > of anything in this email that anyone would disagree with (that is, I'm > > trying to summarize consensus), and I end with a suggested path forward. > > > > I think the biggest challenge we face -- and the challenge that divided > > me and Eric at the meeting -- is how to patch triples that involve blank > > nodes. There seem to be two approaches: > > > > 1. Require the client to create a graph pattern (a "where clause") > > which unambiguously identifies the blank nodes involved in the triples > > to be updated, and require the server to use that graph pattern to find > > those blank nodes in the graph being patched. > > > > 2. Require that during the conversation that ends up involving > > patching, both parties use the same mapping from blank node labels to > > blank nodes. > > > > Option 1 is a good fit for SPARQL. SPARQL servers naturally do that > > graph matching. In contract, standard SPARQL servers don't have any way > > to share blank node scope as required for option 2. That kind of > > exposure of blank node labels has traditionally been avoided in the > > design of RDF systems. > > General remark: Linked Data (in LDP) is different from general RDF: > the data lives in "small" HTTP documents, not in "big" RDF store. I > believe that the problem that SPARQL Update addresses is quite > different from what we want to achieve with LDP PATCH. Because of > that, I was against considering a subset of SPARQL Update at first, > but Eric and my experiments made me change my mind. > > > > > However, the worst-case performance with option 1 is exponential. If a > > triple to be updated is in the middle of a large cloud of blank nodes, > > then matching the where-clause might not be possible before we all die > > of old age. (It's an extremely well studied problem in computer > > science; I'm not an expert, but I think I'm reading the results > correctly.) > > Following Eric's lead, I've actually started with the BGP approach: > DELETE + INSERT + WHERE-with-simple-BGP. To address the complexity > issue, I think we can always add some restriction on the BGP. In > practice, it depends on the expressive power the people are expecting > for a PATCH. > > > > > No one has offered data about how often this worst-case behavior might > > be a problem in practice. Arguably we're still in the early days, so > > it's too soon to know how painful this restriction might turn out to be. > > > > Some people said that the server can just set a time limit and reject > > patches that end up taking too long. Other people (me) replied that > > makes the overall system too unpredictable, that systems should be able > > to send patches with confidence, especially one server to another. As I > > said at the meeting, I don't know if this worst-case performance will > > turn out to be a problem, but I'm concerned enough about it that I can't > > +1 option 1, and don't want my name on a spec based on it. David > > reported at the meeting that Google's internal culture generally forbids > > using exponential algorithms, so we might expect if they were in the > > group they would formally object to option 1 (or just decide to never > > use it, which amounts to the same thing). Our anecdotal reports that > > they don't use SPARQL support this hearsay, but as long is it remains > > hearsay, we probably shouldn't take it too seriously. > > > > Which brings me to the proposal. > > > > Let's move forward with both Option 1 *and* Option 2, marking them both > > "at risk" in the spec. That gives us the whole Last Call and Candidate > > Recommendation periods to gather input on how bad the exponential > > performance issue is for Option 1 and how bad the implementation > > challenge is for Option 2 (how hard it is to get RDF systems to share > > scope in blank node labels). > > > > Then at the end of CR, we can decide if either of them is good enough to > > normatively reference as the basic LDP patch format. If they both end > > up implemented and with people liking them, then we just pick one, so > > the folks don't have to implement both going forward. If neither of > > them is implemented and liked, then we're back to where we are today, > > with no standard patch format for LDP, but some more data on why it's > hard. > > > > How's that sound? > > > > I imagine Option 1 would end up as some subset of SPARQL Update, like > > TurtlePatch [1] plus variables or like Eric presented at the meeting. I > > imagine for Option 2 we'd have something like Andy and Rob's RDFPatch > > [2] or my old GRUF [3] (which I'd forgotten about until reading > RDFPatch). > > Here is my reviews on [1] and [2] and some other proposals. Take into > account that I didn't participate in the conversations on LDP PATCH > and that there is to knowledge no page gathering all the proposals. > > TurtlePatch > ----------- > > Champion: Sandro > > Summary: subset of SPARQL Update with INSERT and DELETE clauses. > > Example: > > [[ > PREFIX foaf <http://xmlns.com/foaf/0.1/> > PREFIX s <http://www.w3.org/2000/01/rdf-schema#> > DELETE DATA { > <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox > <mailto:timbl@w3.org> > } > INSERT DATA { > <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox > <mailto:timbl@hushmail.com> > <http://www.w3.org/People/Berners-Lee/card> s:comment "This is my > general description of myself.\n\nI try to keep data here up to date and > it should be considered authoritative." > } > ]] > > Pros: > * can be implemented using full SPARQL implementation > * easy to implement from scratch (parser + runtime) > > Cons: > * no support for bnodes > > Status: > * I implemented this approach in Banana-RDF > > Remark: Sandro talked about "TurtlePatch plus variables" but I'm not > sure what that means exactly by reading his spec. Until I see a > solution properly considering bnodes, it will be a -1 for me. > > RDF Patch > --------- > > Champion: Andy Seaborne > > Summary: diffs for RDF dataset > > Example: (A is for Add and D for Delete) > > [[ > A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Robert" . > A <http://example.org/bob> <http://xmlns.com/foaf/0.1/knows> > <http://example/alice> . > A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Alice" . > D <http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Robert" . > A <http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Bob" . > ]] > > Pros: > * easy to implement from scratch (parser + runtime) > > Cons: > * specified for an RDF dataset, not an LDPR > * blank nodes are system dependant, so not well specified in the case > of LDP > > Remark: actually pretty much the same than TurtlePatch, but it feels > like it was written for a different use-case. I don't understand how > LDP is supposed to communicate stable bnodes label for that solution > to work. So -1 again for me. > > EricP's proposal (sorry, don't have a better name) > ---------------- > > Champion: EricP > > Summary: SPARQL subset with DELETE, INSERT and WHERE clause. The WHERE > clause is restricted to a simple BGP with no var-predicates. > > Example: > [[ > PREFIX foaf: <http://xmlns.com/foaf/0.1/> > DELETE { ?s foaf:name "Alex" } > INSERT { ?s foaf:name "Alexandre" } > WHERE { ?s foaf:name "Alex" } > ]] > > Pros: > * can be implemented using full SPARQL implementation > * in practice, still easy enough to implement from scratch (parser + > runtime) > * can be used to reach some bnodes > > Cons: > * some queries can be NP-complete (would be good to document one > example) > > Status: > * I implemented this approach in Banana-RDF > > Remark: one could argue that in practice, LDPRs are not supposed to be > crazy big, and that most queries won't end up with the worst-case > complexity. But I do share the concerns. So it's a +1 for me if the > group wants to go with it, with some reservations about the > complexity. > > EricP's proposal + pinned nodes > ------------------------------- > > Champion: TimBL > > *note: TimBL used another term that "pinned node" but I cannot > remember it right now :-/ Please someone (Tim?) help me. > > Summary: same as previous one, but the BGP returns a single matching > node. Some additional constraints are put on the query. TimBL wrote > the algorithm on the whiteboard for me and it made sense. > > Pros: > * can be implemented using full SPARQL implementation > * in practice, still easy enough to implement from scratch (parser + > runtime) > * can be used to reach some bnodes > * expected not to be NP-complete > > Cons: > * we further restrain the number of bnodes that can be > matched. Shouldn't be an issue in practice. > > Status: > * I implemented a similar restriction in Banana-RDF when the BGP is a > tree pattern. > * TimBL told me that Tabulator already implements that approach. > > Remark: it's a refinement of EricP's proposal. The specifics still > have to be worked on but I like the general idea about constraining > the BGP. Another +1 for me, but I obviously prefer this one on > EricP's. > > Joe Presbrey's PATCH > -------------------- > > Champion: Joe Presbrey > > Summary: format is Turtle. For each triple { s p o }, { s p ANY } is > deleted and { s p o } is added. His implementation forbids the use of > blank nodes. > > Pros: > * super easy > > Cons: > * can't have { s p o1; o2 } anymore > * no bnodes > > Status: > * Joe implemented this approach in one of his projects (I guess either > data.fm or ldpy) > > Remarks: it's too destructive and doesn't handle bnodes, so -1. > > > > In summary, it looks like people are ok to consider a subset of > SPARQL. Also, I believe that we cannot ignore the bnodes out there and > LDP PATCH must provide an acceptable solution for them. I personally > found EricP's proposal easy to implement, so that's a clear > candidate. Like others, I'm sensitive to the complexity issue and I > believe that some additional constraint on the BGP should avoid the > pitfall, so I'm interested in TimBL's idea from Tabulator. > > At that point, I believe that the most compelling proposal is EricP's > with TimBL's constraint. If the group shows interest, I'd be > interested in writing a first draft of the spec. > > What do you guys think? > > Alexandre. > > > > > -- Sandro > > > > [1] http://www.w3.org/2001/sw/wiki/TurtlePatch > > [2] http://afs.github.io/rdf-patch > > [3] http://websub.org/wiki/GRUF (from Apr 2010) > > > > > > > > > > >
Received on Friday, 18 October 2013 12:13:50 UTC