Re: the state of ldp-patch, and a procedural proposal

Hi Sandro, all,

unfortunately, I was not with you at the F2F and could not participate in
those discussions.

I'm still uncomfortable with option 2 (as expressed in [1]), and I think
there can be a middle ground between it and the exponential complexity of
option 1.

In fact I think that we can do with a PATCH language that is not able to
express *any* transformation of any graph into any other (but for corner
cases, we can still rely on PUT of full SPARQL update...).

I implemented an extension of Andy and Rob's proposal [2], musing with my
idea that the PATCH language could treat lists as first-class citizens to
reduce the need to handle bnodes directly. This language uses a very simple
form of WHERE clause, in the form of property paths: not very expressive,
but hopefully expressive enough for most cases (bnodes in tree-like
structures or lists, and/or with functional or inverse-functional
properties). Have a look at the README, or the example in the test
directory.

What do you think?

  pa

[1] http://lists.w3.org/Archives/Public/public-ldp-patch/2013Sep/0017.html
[2] https://github.com/pchampin/rdfpatch

On Sun, Sep 15, 2013 at 3:40 AM, Sandro Hawke <sandro@w3.org> wrote:

> There have been some good emails on public-ldp-patch, and there was some
> good discussion at F2F4.   Here's where I think we are.   I don't know
> of anything in this email that anyone would disagree with (that is, I'm
> trying to summarize consensus), and I end with a suggested path forward.
>
> I think the biggest challenge we face -- and the challenge that divided
> me and Eric at the meeting -- is how to patch triples that involve blank
> nodes.   There seem to be two approaches:
>
> 1.  Require the client to create a graph pattern (a "where clause")
> which unambiguously identifies the blank nodes involved in the triples
> to be updated, and require the server to use that graph pattern to find
> those blank nodes in the graph being patched.
>
> 2.  Require that during the conversation that ends up involving
> patching, both parties use the same mapping from blank node labels to
> blank nodes.
>
> Option 1 is a good fit for SPARQL.   SPARQL servers naturally do that
> graph matching.  In contract, standard SPARQL servers don't have any way
> to share blank node scope as required for option 2. That kind of
> exposure of blank node labels has traditionally been avoided in the
> design of RDF systems.
>
> However, the worst-case performance with option 1 is exponential. If a
> triple to be updated is in the middle of a large cloud of blank nodes,
> then matching the where-clause might not be possible before we all die
> of old age.  (It's an extremely well studied problem in computer
> science; I'm not an expert, but I think I'm reading the results correctly.)
>
> No one has offered data about how often this worst-case behavior might
> be a problem in practice.  Arguably we're still in the early days, so
> it's too soon to know how painful this restriction might turn out to be.
>
> Some people said that the server can just set a time limit and reject
> patches that end up taking too long.   Other people (me) replied that
> makes the overall system too unpredictable, that systems should be able
> to send patches with confidence, especially one server to another.  As I
> said at the meeting, I don't know if this worst-case performance will
> turn out to be a problem, but I'm concerned enough about it that I can't
> +1 option 1, and don't want my name on a spec based on it.  David
> reported at the meeting that Google's internal culture generally forbids
> using exponential algorithms, so we might expect if they were in the
> group they would formally object to option 1 (or just decide to never
> use it, which amounts to the same thing).  Our anecdotal reports that
> they don't use SPARQL support this hearsay, but as long is it remains
> hearsay, we probably shouldn't take it too seriously.
>
> Which brings me to the proposal.
>
> Let's move forward with both Option 1 *and* Option 2, marking them both
> "at risk" in the spec.   That gives us the whole Last Call and Candidate
> Recommendation periods to gather input on how bad the exponential
> performance issue is for Option 1 and how bad the implementation
> challenge is for Option 2 (how hard it is to get RDF systems to share
> scope in blank node labels).
>
> Then at the end of CR, we can decide if either of them is good enough to
> normatively reference as the basic LDP patch format.   If they both end
> up implemented and with people liking them, then we just pick one, so
> the folks don't have to implement both going forward.    If neither of
> them is implemented and liked, then we're back to where we are today,
> with no standard patch format for LDP, but some more data on why it's hard.
>
> How's that sound?
>
> I imagine Option 1 would end up as some subset of SPARQL Update, like
> TurtlePatch  [1] plus variables or like Eric presented at the meeting.
> I imagine for Option 2 we'd have something like Andy and Rob's RDFPatch
> [2] or my old GRUF [3] (which I'd forgotten about until reading RDFPatch).
>
>      -- Sandro
>
> [1]  http://www.w3.org/2001/sw/wiki/TurtlePatch
> [2]  http://afs.github.io/rdf-patch
> [3]  http://websub.org/wiki/GRUF (from Apr 2010)
>
>
>
>

Received on Monday, 16 September 2013 20:35:52 UTC