Re: the state of ldp-patch, and a procedural proposal from Steve Speicher on 2013-10-02 (public-ldp-wg@w3.org from October 2013)

From: Steve Speicher <sspeiche@gmail.com>
Date: Wed, 2 Oct 2013 10:54:30 -0400
To: Sandro Hawke <sandro@w3.org>
Cc: public-ldp-patch@w3.org, "Eric Prud'hommeaux" <eric@w3.org>, Tim Berners-Lee <timbl@w3.org>, Linked Data Platform WG <public-ldp-wg@w3.org>
Message-ID: <CAOUJ7JryuGKE7q-u4_g+aNArZsBGyO3ONQAh4=U8RwVR7V9i3A@mail.gmail.com>
Sandro,

My typical resource graphs and patch scenarios have led me to an approach
[1] somewhat similar to option #1.
My approach [1] is to follow a very simple model such as:
  a) here are the triples to remove from the graph (exactly, no dependency
on blank node labels)
  b) here are the triples to add to the graph
This seems to hit near 100% of my cases.  To be clear, this has not been
widely deployed so the amount of cases and types of resources is limited.

After polling another team that is using the LDP approach, they in fact
don't support PUT for updating resources but PATCH only.  In their model,
they reused an existing RDF format and defined some simple patterns (such
as a triple in the patch document that matches subject and predicate with
triples in the graph, remove those matched triples and replace with new
triple).  This group doesn't use SPARQL but stores RDF data natively.  This
team expressed some concern in library/tool generated PATCH documents in
SPARQL-like format, mostly founded on complexity of the format and overhead
of client libraries, along with potential errors.

Just some feedback.

[1] - http://open-services.net/wiki/core/OSLC-Core-Partial-Update/

- Steve Speicher


On Sat, Sep 14, 2013 at 9:40 PM, Sandro Hawke <sandro@w3.org> wrote:

> There have been some good emails on public-ldp-patch, and there was some
> good discussion at F2F4.   Here's where I think we are.   I don't know of
> anything in this email that anyone would disagree with (that is, I'm trying
> to summarize consensus), and I end with a suggested path forward.
>
> I think the biggest challenge we face -- and the challenge that divided me
> and Eric at the meeting -- is how to patch triples that involve blank
> nodes.   There seem to be two approaches:
>
> 1.  Require the client to create a graph pattern (a "where clause") which
> unambiguously identifies the blank nodes involved in the triples to be
> updated, and require the server to use that graph pattern to find those
> blank nodes in the graph being patched.
>
> 2.  Require that during the conversation that ends up involving patching,
> both parties use the same mapping from blank node labels to blank nodes.
>
> Option 1 is a good fit for SPARQL.   SPARQL servers naturally do that
> graph matching.  In contract, standard SPARQL servers don't have any way to
> share blank node scope as required for option 2. That kind of exposure of
> blank node labels has traditionally been avoided in the design of RDF
> systems.
>
> However, the worst-case performance with option 1 is exponential. If a
> triple to be updated is in the middle of a large cloud of blank nodes, then
> matching the where-clause might not be possible before we all die of old
> age.  (It's an extremely well studied problem in computer science; I'm not
> an expert, but I think I'm reading the results correctly.)
>
> No one has offered data about how often this worst-case behavior might be
> a problem in practice.  Arguably we're still in the early days, so it's too
> soon to know how painful this restriction might turn out to be.
>
> Some people said that the server can just set a time limit and reject
> patches that end up taking too long.   Other people (me) replied that makes
> the overall system too unpredictable, that systems should be able to send
> patches with confidence, especially one server to another.  As I said at
> the meeting, I don't know if this worst-case performance will turn out to
> be a problem, but I'm concerned enough about it that I can't +1 option 1,
> and don't want my name on a spec based on it.  David reported at the
> meeting that Google's internal culture generally forbids using exponential
> algorithms, so we might expect if they were in the group they would
> formally object to option 1 (or just decide to never use it, which amounts
> to the same thing).  Our anecdotal reports that they don't use SPARQL
> support this hearsay, but as long is it remains hearsay, we probably
> shouldn't take it too seriously.
>
> Which brings me to the proposal.
>
> Let's move forward with both Option 1 *and* Option 2, marking them both
> "at risk" in the spec.   That gives us the whole Last Call and Candidate
> Recommendation periods to gather input on how bad the exponential
> performance issue is for Option 1 and how bad the implementation challenge
> is for Option 2 (how hard it is to get RDF systems to share scope in blank
> node labels).
>
> Then at the end of CR, we can decide if either of them is good enough to
> normatively reference as the basic LDP patch format.   If they both end up
> implemented and with people liking them, then we just pick one, so the
> folks don't have to implement both going forward.    If neither of them is
> implemented and liked, then we're back to where we are today, with no
> standard patch format for LDP, but some more data on why it's hard.
>
> How's that sound?
>
> I imagine Option 1 would end up as some subset of SPARQL Update, like
> TurtlePatch  [1] plus variables or like Eric presented at the meeting.  I
> imagine for Option 2 we'd have something like Andy and Rob's RDFPatch [2]
> or my old GRUF [3] (which I'd forgotten about until reading RDFPatch).
>
>     -- Sandro
>
> [1]  http://www.w3.org/2001/sw/**wiki/TurtlePatch<http://www.w3.org/2001/sw/wiki/TurtlePatch>
> [2]  http://afs.github.io/rdf-patch
> [3]  http://websub.org/wiki/GRUF (from Apr 2010)
>
>
>
>
Received on Wednesday, 2 October 2013 14:55:01 UTC