Re: the state of ldp-patch, and a procedural proposal

On 10/02/2013 10:54 AM, Steve Speicher wrote:
> Sandro,
>
> My typical resource graphs and patch scenarios have led me to an 
> approach [1] somewhat similar to option #1.
> My approach [1] is to follow a very simple model such as:
>   a) here are the triples to remove from the graph (exactly, no 
> dependency on blank node labels)

So your data has no blank nodes, right?

>   b) here are the triples to add to the graph
> This seems to hit near 100% of my cases.  To be clear, this has not 
> been widely deployed so the amount of cases and types of resources is 
> limited.
>
> After polling another team that is using the LDP approach, they in 
> fact don't support PUT for updating resources but PATCH only.  In 
> their model, they reused an existing RDF format and defined some 
> simple patterns (such as a triple in the patch document that matches 
> subject and predicate with triples in the graph, remove those matched 
> triples and replace with new triple).  This group doesn't use SPARQL 
> but stores RDF data natively.  This team expressed some concern in 
> library/tool generated PATCH documents in SPARQL-like format, mostly 
> founded on complexity of the format and overhead of client libraries, 
> along with potential errors.
>

Is their data also free from blank nodes?

Thanks.

       -- Sandro

> Just some feedback.
>
> [1] - http://open-services.net/wiki/core/OSLC-Core-Partial-Update/
>
> - Steve Speicher
>
>
> On Sat, Sep 14, 2013 at 9:40 PM, Sandro Hawke <sandro@w3.org 
> <mailto:sandro@w3.org>> wrote:
>
>     There have been some good emails on public-ldp-patch, and there
>     was some good discussion at F2F4.   Here's where I think we are.  
>     I don't know of anything in this email that anyone would disagree
>     with (that is, I'm trying to summarize consensus), and I end with
>     a suggested path forward.
>
>     I think the biggest challenge we face -- and the challenge that
>     divided me and Eric at the meeting -- is how to patch triples that
>     involve blank nodes.   There seem to be two approaches:
>
>     1.  Require the client to create a graph pattern (a "where
>     clause") which unambiguously identifies the blank nodes involved
>     in the triples to be updated, and require the server to use that
>     graph pattern to find those blank nodes in the graph being patched.
>
>     2.  Require that during the conversation that ends up involving
>     patching, both parties use the same mapping from blank node labels
>     to blank nodes.
>
>     Option 1 is a good fit for SPARQL.   SPARQL servers naturally do
>     that graph matching.  In contract, standard SPARQL servers don't
>     have any way to share blank node scope as required for option 2.
>     That kind of exposure of blank node labels has traditionally been
>     avoided in the design of RDF systems.
>
>     However, the worst-case performance with option 1 is exponential.
>     If a triple to be updated is in the middle of a large cloud of
>     blank nodes, then matching the where-clause might not be possible
>     before we all die of old age.  (It's an extremely well studied
>     problem in computer science; I'm not an expert, but I think I'm
>     reading the results correctly.)
>
>     No one has offered data about how often this worst-case behavior
>     might be a problem in practice.  Arguably we're still in the early
>     days, so it's too soon to know how painful this restriction might
>     turn out to be.
>
>     Some people said that the server can just set a time limit and
>     reject patches that end up taking too long. Other people (me)
>     replied that makes the overall system too unpredictable, that
>     systems should be able to send patches with confidence, especially
>     one server to another.  As I said at the meeting, I don't know if
>     this worst-case performance will turn out to be a problem, but I'm
>     concerned enough about it that I can't +1 option 1, and don't want
>     my name on a spec based on it.  David reported at the meeting that
>     Google's internal culture generally forbids using exponential
>     algorithms, so we might expect if they were in the group they
>     would formally object to option 1 (or just decide to never use it,
>     which amounts to the same thing).  Our anecdotal reports that they
>     don't use SPARQL support this hearsay, but as long is it remains
>     hearsay, we probably shouldn't take it too seriously.
>
>     Which brings me to the proposal.
>
>     Let's move forward with both Option 1 *and* Option 2, marking them
>     both "at risk" in the spec.   That gives us the whole Last Call
>     and Candidate Recommendation periods to gather input on how bad
>     the exponential performance issue is for Option 1 and how bad the
>     implementation challenge is for Option 2 (how hard it is to get
>     RDF systems to share scope in blank node labels).
>
>     Then at the end of CR, we can decide if either of them is good
>     enough to normatively reference as the basic LDP patch format.  
>     If they both end up implemented and with people liking them, then
>     we just pick one, so the folks don't have to implement both going
>     forward.    If neither of them is implemented and liked, then
>     we're back to where we are today, with no standard patch format
>     for LDP, but some more data on why it's hard.
>
>     How's that sound?
>
>     I imagine Option 1 would end up as some subset of SPARQL Update,
>     like TurtlePatch  [1] plus variables or like Eric presented at the
>     meeting.  I imagine for Option 2 we'd have something like Andy and
>     Rob's RDFPatch [2] or my old GRUF [3] (which I'd forgotten about
>     until reading RDFPatch).
>
>         -- Sandro
>
>     [1] http://www.w3.org/2001/sw/wiki/TurtlePatch
>     [2] http://afs.github.io/rdf-patch
>     [3] http://websub.org/wiki/GRUF (from Apr 2010)
>
>
>
>

Received on Wednesday, 2 October 2013 17:01:48 UTC