- From: Alexandre Bertails <bertails@w3.org>
- Date: Thu, 17 Oct 2013 22:57:05 -0400
- To: Sandro Hawke <sandro@w3.org>, public-ldp-patch@w3.org
- CC: Eric Prud'hommeaux <eric@w3.org>, Tim Berners-Lee <timbl@w3.org>, Linked Data Platform WG <public-ldp-wg@w3.org>
Hi guys, I've been playing with different approaches around LDP PATCH. I wanted to provide some feedback, both from the user's perspective and from the implementer's perspective. As a user, I'm among the people interested in implementing a Decentralized Social Web using a vanilla/generic LDP server, with WebID and WebACL. I've mainly focused my own experiments with being able to patch WebID profiles. So far in the read-write-web server, we've been using full SPARQL but we are interested in a lighter PATCH format which would not rely on external libraries. On 09/14/2013 09:40 PM, Sandro Hawke wrote: > There have been some good emails on public-ldp-patch, and there was some > good discussion at F2F4. Here's where I think we are. I don't know > of anything in this email that anyone would disagree with (that is, I'm > trying to summarize consensus), and I end with a suggested path forward. > > I think the biggest challenge we face -- and the challenge that divided > me and Eric at the meeting -- is how to patch triples that involve blank > nodes. There seem to be two approaches: > > 1. Require the client to create a graph pattern (a "where clause") > which unambiguously identifies the blank nodes involved in the triples > to be updated, and require the server to use that graph pattern to find > those blank nodes in the graph being patched. > > 2. Require that during the conversation that ends up involving > patching, both parties use the same mapping from blank node labels to > blank nodes. > > Option 1 is a good fit for SPARQL. SPARQL servers naturally do that > graph matching. In contract, standard SPARQL servers don't have any way > to share blank node scope as required for option 2. That kind of > exposure of blank node labels has traditionally been avoided in the > design of RDF systems. General remark: Linked Data (in LDP) is different from general RDF: the data lives in "small" HTTP documents, not in "big" RDF store. I believe that the problem that SPARQL Update addresses is quite different from what we want to achieve with LDP PATCH. Because of that, I was against considering a subset of SPARQL Update at first, but Eric and my experiments made me change my mind. > > However, the worst-case performance with option 1 is exponential. If a > triple to be updated is in the middle of a large cloud of blank nodes, > then matching the where-clause might not be possible before we all die > of old age. (It's an extremely well studied problem in computer > science; I'm not an expert, but I think I'm reading the results correctly.) Following Eric's lead, I've actually started with the BGP approach: DELETE + INSERT + WHERE-with-simple-BGP. To address the complexity issue, I think we can always add some restriction on the BGP. In practice, it depends on the expressive power the people are expecting for a PATCH. > > No one has offered data about how often this worst-case behavior might > be a problem in practice. Arguably we're still in the early days, so > it's too soon to know how painful this restriction might turn out to be. > > Some people said that the server can just set a time limit and reject > patches that end up taking too long. Other people (me) replied that > makes the overall system too unpredictable, that systems should be able > to send patches with confidence, especially one server to another. As I > said at the meeting, I don't know if this worst-case performance will > turn out to be a problem, but I'm concerned enough about it that I can't > +1 option 1, and don't want my name on a spec based on it. David > reported at the meeting that Google's internal culture generally forbids > using exponential algorithms, so we might expect if they were in the > group they would formally object to option 1 (or just decide to never > use it, which amounts to the same thing). Our anecdotal reports that > they don't use SPARQL support this hearsay, but as long is it remains > hearsay, we probably shouldn't take it too seriously. > > Which brings me to the proposal. > > Let's move forward with both Option 1 *and* Option 2, marking them both > "at risk" in the spec. That gives us the whole Last Call and Candidate > Recommendation periods to gather input on how bad the exponential > performance issue is for Option 1 and how bad the implementation > challenge is for Option 2 (how hard it is to get RDF systems to share > scope in blank node labels). > > Then at the end of CR, we can decide if either of them is good enough to > normatively reference as the basic LDP patch format. If they both end > up implemented and with people liking them, then we just pick one, so > the folks don't have to implement both going forward. If neither of > them is implemented and liked, then we're back to where we are today, > with no standard patch format for LDP, but some more data on why it's hard. > > How's that sound? > > I imagine Option 1 would end up as some subset of SPARQL Update, like > TurtlePatch [1] plus variables or like Eric presented at the meeting. I > imagine for Option 2 we'd have something like Andy and Rob's RDFPatch > [2] or my old GRUF [3] (which I'd forgotten about until reading RDFPatch). Here is my reviews on [1] and [2] and some other proposals. Take into account that I didn't participate in the conversations on LDP PATCH and that there is to knowledge no page gathering all the proposals. TurtlePatch ----------- Champion: Sandro Summary: subset of SPARQL Update with INSERT and DELETE clauses. Example: [[ PREFIX foaf <http://xmlns.com/foaf/0.1/> PREFIX s <http://www.w3.org/2000/01/rdf-schema#> DELETE DATA { <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox <mailto:timbl@w3.org> } INSERT DATA { <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox <mailto:timbl@hushmail.com> <http://www.w3.org/People/Berners-Lee/card> s:comment "This is my general description of myself.\n\nI try to keep data here up to date and it should be considered authoritative." } ]] Pros: * can be implemented using full SPARQL implementation * easy to implement from scratch (parser + runtime) Cons: * no support for bnodes Status: * I implemented this approach in Banana-RDF Remark: Sandro talked about "TurtlePatch plus variables" but I'm not sure what that means exactly by reading his spec. Until I see a solution properly considering bnodes, it will be a -1 for me. RDF Patch --------- Champion: Andy Seaborne Summary: diffs for RDF dataset Example: (A is for Add and D for Delete) [[ A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Robert" . A <http://example.org/bob> <http://xmlns.com/foaf/0.1/knows> <http://example/alice> . A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Alice" . D <http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Robert" . A <http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Bob" . ]] Pros: * easy to implement from scratch (parser + runtime) Cons: * specified for an RDF dataset, not an LDPR * blank nodes are system dependant, so not well specified in the case of LDP Remark: actually pretty much the same than TurtlePatch, but it feels like it was written for a different use-case. I don't understand how LDP is supposed to communicate stable bnodes label for that solution to work. So -1 again for me. EricP's proposal (sorry, don't have a better name) ---------------- Champion: EricP Summary: SPARQL subset with DELETE, INSERT and WHERE clause. The WHERE clause is restricted to a simple BGP with no var-predicates. Example: [[ PREFIX foaf: <http://xmlns.com/foaf/0.1/> DELETE { ?s foaf:name "Alex" } INSERT { ?s foaf:name "Alexandre" } WHERE { ?s foaf:name "Alex" } ]] Pros: * can be implemented using full SPARQL implementation * in practice, still easy enough to implement from scratch (parser + runtime) * can be used to reach some bnodes Cons: * some queries can be NP-complete (would be good to document one example) Status: * I implemented this approach in Banana-RDF Remark: one could argue that in practice, LDPRs are not supposed to be crazy big, and that most queries won't end up with the worst-case complexity. But I do share the concerns. So it's a +1 for me if the group wants to go with it, with some reservations about the complexity. EricP's proposal + pinned nodes ------------------------------- Champion: TimBL *note: TimBL used another term that "pinned node" but I cannot remember it right now :-/ Please someone (Tim?) help me. Summary: same as previous one, but the BGP returns a single matching node. Some additional constraints are put on the query. TimBL wrote the algorithm on the whiteboard for me and it made sense. Pros: * can be implemented using full SPARQL implementation * in practice, still easy enough to implement from scratch (parser + runtime) * can be used to reach some bnodes * expected not to be NP-complete Cons: * we further restrain the number of bnodes that can be matched. Shouldn't be an issue in practice. Status: * I implemented a similar restriction in Banana-RDF when the BGP is a tree pattern. * TimBL told me that Tabulator already implements that approach. Remark: it's a refinement of EricP's proposal. The specifics still have to be worked on but I like the general idea about constraining the BGP. Another +1 for me, but I obviously prefer this one on EricP's. Joe Presbrey's PATCH -------------------- Champion: Joe Presbrey Summary: format is Turtle. For each triple { s p o }, { s p ANY } is deleted and { s p o } is added. His implementation forbids the use of blank nodes. Pros: * super easy Cons: * can't have { s p o1; o2 } anymore * no bnodes Status: * Joe implemented this approach in one of his projects (I guess either data.fm or ldpy) Remarks: it's too destructive and doesn't handle bnodes, so -1. In summary, it looks like people are ok to consider a subset of SPARQL. Also, I believe that we cannot ignore the bnodes out there and LDP PATCH must provide an acceptable solution for them. I personally found EricP's proposal easy to implement, so that's a clear candidate. Like others, I'm sensitive to the complexity issue and I believe that some additional constraint on the BGP should avoid the pitfall, so I'm interested in TimBL's idea from Tabulator. At that point, I believe that the most compelling proposal is EricP's with TimBL's constraint. If the group shows interest, I'd be interested in writing a first draft of the spec. What do you guys think? Alexandre. > > -- Sandro > > [1] http://www.w3.org/2001/sw/wiki/TurtlePatch > [2] http://afs.github.io/rdf-patch > [3] http://websub.org/wiki/GRUF (from Apr 2010) > > > >
Received on Friday, 18 October 2013 02:57:13 UTC