- From: Alexandre Bertails <bertails@w3.org>
- Date: Fri, 18 Oct 2013 14:54:42 -0400
- To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
- CC: "public-ldp-patch@w3.org" <public-ldp-patch@w3.org>, Linked Data Platform WG <public-ldp-wg@w3.org>
On 10/18/2013 10:05 AM, Alexandre Bertails wrote:
> On 10/18/2013 08:13 AM, Pierre-Antoine Champin wrote:
>> Alexandre, all,
>>
>> first thanks for this nice review.
>>
>> It seems that my own proposal [1] has gone completely unnoticed.
>> I'm not taking this personnally, but I would like to hear the feedback
>> of the group on this idea, especially the handling of RDF lists, which I
>> think are not covered by any of the proposals mentionned by Alexandre.
>>
>
> Effectively, nothing personal, but my apologies for missing yours. I've
> created a wiki page with my summary and have already added yours [2].
Note that I've just added TimBL's definition for a nailed graph. It's at
http://www.w3.org/2012/ldp/wiki/LDP_PATCH_Proposals#Definition_of_a_nailed_graph_.28TimBL.29
Alexandre.
>
> Generally, I like this approach. I do like the handling of rdf:list
> (despite what the SPARQL crowd says, rdf:list are awesome!) but I
> still need to think more about it. It is not a subset of SPARQL Update
> but having property paths is already a good way to access some bnodes
> without referring to them. I would prefer to avoid relying on
> skolemization. The B (Bind) idea looks simple. I believe this can
> easily be combined with the simple BGP + nailed nodes approach.
>
> I see emerging several trends:
>
> * 2 kind of syntaxes: SPARQL Update subset or RDF PATCH derivatives
>
> * skolemizing bnodes VS matching bnodes
>
> * expressive power, ie. how powerful and complex the BGP can be
>
> * support of specific features, eg. rdf:list
>
> Maybe we can have polls to make progress? I believe that the very
> first question to answer is about skolemization.
>
> Alexandre.
>
> [2]
> http://www.w3.org/2012/ldp/wiki/LDP_PATCH_Proposals#Pierre-Antoine_Champin.27s_RDF-PATCH
>
>
>
>
>> pa
>>
>> [1]
>> http://lists.w3.org/Archives/Public/public-ldp-patch/2013Sep/0022.html
>>
>>
>> On Fri, Oct 18, 2013 at 4:57 AM, Alexandre Bertails <bertails@w3.org
>> <mailto:bertails@w3.org>> wrote:
>>
>> Hi guys,
>>
>> I've been playing with different approaches around LDP PATCH. I
>> wanted
>> to provide some feedback, both from the user's perspective and from
>> the implementer's perspective.
>>
>> As a user, I'm among the people interested in implementing a
>> Decentralized Social Web using a vanilla/generic LDP server, with
>> WebID and WebACL. I've mainly focused my own experiments with being
>> able to patch WebID profiles. So far in the read-write-web server,
>> we've been using full SPARQL but we are interested in a lighter PATCH
>> format which would not rely on external libraries.
>>
>> On 09/14/2013 09:40 PM, Sandro Hawke wrote:
>> > There have been some good emails on public-ldp-patch, and there
>> was some
>> > good discussion at F2F4. Here's where I think we are. I don't
>> know
>> > of anything in this email that anyone would disagree with (that
>> is, I'm
>> > trying to summarize consensus), and I end with a suggested path
>> forward.
>> >
>> > I think the biggest challenge we face -- and the challenge that
>> divided
>> > me and Eric at the meeting -- is how to patch triples that
>> involve blank
>> > nodes. There seem to be two approaches:
>> >
>> > 1. Require the client to create a graph pattern (a "where
>> clause")
>> > which unambiguously identifies the blank nodes involved in the
>> triples
>> > to be updated, and require the server to use that graph pattern
>> to find
>> > those blank nodes in the graph being patched.
>> >
>> > 2. Require that during the conversation that ends up involving
>> > patching, both parties use the same mapping from blank node
>> labels to
>> > blank nodes.
>> >
>> > Option 1 is a good fit for SPARQL. SPARQL servers naturally
>> do that
>> > graph matching. In contract, standard SPARQL servers don't have
>> any way
>> > to share blank node scope as required for option 2. That kind of
>> > exposure of blank node labels has traditionally been avoided in
>> the
>> > design of RDF systems.
>>
>> General remark: Linked Data (in LDP) is different from general RDF:
>> the data lives in "small" HTTP documents, not in "big" RDF store. I
>> believe that the problem that SPARQL Update addresses is quite
>> different from what we want to achieve with LDP PATCH. Because of
>> that, I was against considering a subset of SPARQL Update at first,
>> but Eric and my experiments made me change my mind.
>>
>> >
>> > However, the worst-case performance with option 1 is exponential.
>> If a
>> > triple to be updated is in the middle of a large cloud of blank
>> nodes,
>> > then matching the where-clause might not be possible before we
>> all die
>> > of old age. (It's an extremely well studied problem in computer
>> > science; I'm not an expert, but I think I'm reading the results
>> correctly.)
>>
>> Following Eric's lead, I've actually started with the BGP approach:
>> DELETE + INSERT + WHERE-with-simple-BGP. To address the complexity
>> issue, I think we can always add some restriction on the BGP. In
>> practice, it depends on the expressive power the people are expecting
>> for a PATCH.
>>
>> >
>> > No one has offered data about how often this worst-case behavior
>> might
>> > be a problem in practice. Arguably we're still in the early
>> days, so
>> > it's too soon to know how painful this restriction might turn out
>> to be.
>> >
>> > Some people said that the server can just set a time limit and
>> reject
>> > patches that end up taking too long. Other people (me)
>> replied that
>> > makes the overall system too unpredictable, that systems should
>> be able
>> > to send patches with confidence, especially one server to
>> another. As I
>> > said at the meeting, I don't know if this worst-case
>> performance will
>> > turn out to be a problem, but I'm concerned enough about it that
>> I can't
>> > +1 option 1, and don't want my name on a spec based on it. David
>> > reported at the meeting that Google's internal culture generally
>> forbids
>> > using exponential algorithms, so we might expect if they were
>> in the
>> > group they would formally object to option 1 (or just decide to
>> never
>> > use it, which amounts to the same thing). Our anecdotal
>> reports that
>> > they don't use SPARQL support this hearsay, but as long is it
>> remains
>> > hearsay, we probably shouldn't take it too seriously.
>> >
>> > Which brings me to the proposal.
>> >
>> > Let's move forward with both Option 1 *and* Option 2, marking
>> them both
>> > "at risk" in the spec. That gives us the whole Last Call and
>> Candidate
>> > Recommendation periods to gather input on how bad the exponential
>> > performance issue is for Option 1 and how bad the implementation
>> > challenge is for Option 2 (how hard it is to get RDF systems to
>> share
>> > scope in blank node labels).
>> >
>> > Then at the end of CR, we can decide if either of them is good
>> enough to
>> > normatively reference as the basic LDP patch format. If they
>> both end
>> > up implemented and with people liking them, then we just pick
>> one, so
>> > the folks don't have to implement both going forward. If
>> neither of
>> > them is implemented and liked, then we're back to where we are
>> today,
>> > with no standard patch format for LDP, but some more data on why
>> it's hard.
>> >
>> > How's that sound?
>> >
>> > I imagine Option 1 would end up as some subset of SPARQL
>> Update, like
>> > TurtlePatch [1] plus variables or like Eric presented at the
>> meeting. I
>> > imagine for Option 2 we'd have something like Andy and Rob's
>> RDFPatch
>> > [2] or my old GRUF [3] (which I'd forgotten about until reading
>> RDFPatch).
>>
>> Here is my reviews on [1] and [2] and some other proposals. Take into
>> account that I didn't participate in the conversations on LDP PATCH
>> and that there is to knowledge no page gathering all the proposals.
>>
>> TurtlePatch
>> -----------
>>
>> Champion: Sandro
>>
>> Summary: subset of SPARQL Update with INSERT and DELETE clauses.
>>
>> Example:
>>
>> [[
>> PREFIX foaf <http://xmlns.com/foaf/0.1/>
>> PREFIX s <http://www.w3.org/2000/01/rdf-schema#>
>> DELETE DATA {
>> <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox
>> <mailto:timbl@w3.org <mailto:timbl@w3.org>>
>> }
>> INSERT DATA {
>> <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox
>> <mailto:timbl@hushmail.com <mailto:timbl@hushmail.com>>
>> <http://www.w3.org/People/Berners-Lee/card> s:comment "This
>> is my
>> general description of myself.\n\nI try to keep data here up to
>> date and
>> it should be considered authoritative."
>> }
>> ]]
>>
>> Pros:
>> * can be implemented using full SPARQL implementation
>> * easy to implement from scratch (parser + runtime)
>>
>> Cons:
>> * no support for bnodes
>>
>> Status:
>> * I implemented this approach in Banana-RDF
>>
>> Remark: Sandro talked about "TurtlePatch plus variables" but I'm not
>> sure what that means exactly by reading his spec. Until I see a
>> solution properly considering bnodes, it will be a -1 for me.
>>
>> RDF Patch
>> ---------
>>
>> Champion: Andy Seaborne
>>
>> Summary: diffs for RDF dataset
>>
>> Example: (A is for Add and D for Delete)
>>
>> [[
>> A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name>
>> "Robert" .
>> A <http://example.org/bob> <http://xmlns.com/foaf/0.1/knows>
>> <http://example/alice> .
>> A <http://example.org/alice> <http://xmlns.com/foaf/0.1/name>
>> "Alice" .
>> D <http://example.org/bob> <http://xmlns.com/foaf/0.1/name>
>> "Robert" .
>> A <http://example.org/bob> <http://xmlns.com/foaf/0.1/name>
>> "Bob" .
>> ]]
>>
>> Pros:
>> * easy to implement from scratch (parser + runtime)
>>
>> Cons:
>> * specified for an RDF dataset, not an LDPR
>> * blank nodes are system dependant, so not well specified in the case
>> of LDP
>>
>> Remark: actually pretty much the same than TurtlePatch, but it feels
>> like it was written for a different use-case. I don't understand how
>> LDP is supposed to communicate stable bnodes label for that solution
>> to work. So -1 again for me.
>>
>> EricP's proposal (sorry, don't have a better name)
>> ----------------
>>
>> Champion: EricP
>>
>> Summary: SPARQL subset with DELETE, INSERT and WHERE clause. The
>> WHERE
>> clause is restricted to a simple BGP with no var-predicates.
>>
>> Example:
>> [[
>> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>> DELETE { ?s foaf:name "Alex" }
>> INSERT { ?s foaf:name "Alexandre" }
>> WHERE { ?s foaf:name "Alex" }
>> ]]
>>
>> Pros:
>> * can be implemented using full SPARQL implementation
>> * in practice, still easy enough to implement from scratch (parser +
>> runtime)
>> * can be used to reach some bnodes
>>
>> Cons:
>> * some queries can be NP-complete (would be good to document one
>> example)
>>
>> Status:
>> * I implemented this approach in Banana-RDF
>>
>> Remark: one could argue that in practice, LDPRs are not supposed
>> to be
>> crazy big, and that most queries won't end up with the worst-case
>> complexity. But I do share the concerns. So it's a +1 for me if the
>> group wants to go with it, with some reservations about the
>> complexity.
>>
>> EricP's proposal + pinned nodes
>> -------------------------------
>>
>> Champion: TimBL
>>
>> *note: TimBL used another term that "pinned node" but I cannot
>> remember it right now :-/ Please someone (Tim?) help me.
>>
>> Summary: same as previous one, but the BGP returns a single matching
>> node. Some additional constraints are put on the query. TimBL wrote
>> the algorithm on the whiteboard for me and it made sense.
>>
>> Pros:
>> * can be implemented using full SPARQL implementation
>> * in practice, still easy enough to implement from scratch (parser +
>> runtime)
>> * can be used to reach some bnodes
>> * expected not to be NP-complete
>>
>> Cons:
>> * we further restrain the number of bnodes that can be
>> matched. Shouldn't be an issue in practice.
>>
>> Status:
>> * I implemented a similar restriction in Banana-RDF when the BGP is a
>> tree pattern.
>> * TimBL told me that Tabulator already implements that approach.
>>
>> Remark: it's a refinement of EricP's proposal. The specifics still
>> have to be worked on but I like the general idea about constraining
>> the BGP. Another +1 for me, but I obviously prefer this one on
>> EricP's.
>>
>> Joe Presbrey's PATCH
>> --------------------
>>
>> Champion: Joe Presbrey
>>
>> Summary: format is Turtle. For each triple { s p o }, { s p ANY } is
>> deleted and { s p o } is added. His implementation forbids the use of
>> blank nodes.
>>
>> Pros:
>> * super easy
>>
>> Cons:
>> * can't have { s p o1; o2 } anymore
>> * no bnodes
>>
>> Status:
>> * Joe implemented this approach in one of his projects (I guess
>> either
>> data.fm <http://data.fm> or ldpy)
>>
>> Remarks: it's too destructive and doesn't handle bnodes, so -1.
>>
>>
>>
>> In summary, it looks like people are ok to consider a subset of
>> SPARQL. Also, I believe that we cannot ignore the bnodes out there
>> and
>> LDP PATCH must provide an acceptable solution for them. I personally
>> found EricP's proposal easy to implement, so that's a clear
>> candidate. Like others, I'm sensitive to the complexity issue and I
>> believe that some additional constraint on the BGP should avoid the
>> pitfall, so I'm interested in TimBL's idea from Tabulator.
>>
>> At that point, I believe that the most compelling proposal is EricP's
>> with TimBL's constraint. If the group shows interest, I'd be
>> interested in writing a first draft of the spec.
>>
>> What do you guys think?
>>
>> Alexandre.
>>
>> >
>> > -- Sandro
>> >
>> > [1] http://www.w3.org/2001/sw/wiki/TurtlePatch
>> > [2] http://afs.github.io/rdf-patch
>> > [3] http://websub.org/wiki/GRUF (from Apr 2010)
>> >
>> >
>> >
>> >
>>
>>
>>
>
Received on Friday, 18 October 2013 18:54:45 UTC