Re: the state of ldp-patch, and a procedural proposal from Sandro Hawke on 2013-10-02 (public-ldp-patch@w3.org from October 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 02 Oct 2013 16:51:57 -0400
To: Steve Speicher <sspeiche@gmail.com>
CC: public-ldp-patch@w3.org,Eric Prud'hommeaux <eric@w3.org>,Tim Berners-Lee <timbl@w3.org>,Linked Data Platform WG <public-ldp-wg@w3.org>
Message-ID: <68168d91-cd21-4dc4-bab1-6b7e208c9a7a@email.android.com>
Steve Speicher <sspeiche@gmail.com> wrote:
>On Wed, Oct 2, 2013 at 1:01 PM, Sandro Hawke <sandro@w3.org> wrote:
>
>>  On 10/02/2013 10:54 AM, Steve Speicher wrote:
>>
>> Sandro,
>>
>>  My typical resource graphs and patch scenarios have led me to an
>> approach [1] somewhat similar to option #1.
>> My approach [1] is to follow a very simple model such as:
>>   a) here are the triples to remove from the graph (exactly, no
>dependency
>> on blank node labels)
>>
>>
>> So your data has no blank nodes, right?
>>
>
>No, it has some blank nodes but its usage is somewhat limited.  Dare I
>mention that our resources have some reification statements where we
>just
>key based on the reified statement to find the right triples to modify.
>

Is that keying done in an application specified way by the server, or is that indicated by the client?

If it's the client, that means you have a WHERE clause.    Do the variables in that match only blank nodes?    What happens if they match multiple times?    Are there limits to how complex the pattern is allowed to be?

>
>>
>>
>>    b) here are the triples to add to the graph
>> This seems to hit near 100% of my cases.  To be clear, this has not
>been
>> widely deployed so the amount of cases and types of resources is
>limited.
>>
>>  After polling another team that is using the LDP approach, they in
>fact
>> don't support PUT for updating resources but PATCH only.  In their
>model,
>> they reused an existing RDF format and defined some simple patterns
>(such
>> as a triple in the patch document that matches subject and predicate
>with
>> triples in the graph, remove those matched triples and replace with
>new
>> triple).  This group doesn't use SPARQL but stores RDF data natively.
> This
>> team expressed some concern in library/tool generated PATCH documents
>in
>> SPARQL-like format, mostly founded on complexity of the format and
>overhead
>> of client libraries, along with potential errors.
>>
>>
>> Is their data also free from blank nodes?
>>
>
>It is not but they feel like it could easily support it.
>

They can support a WHERE clause with NP variable matching, with backtracking and all that, but don't want to parse SPARQL?

(sorry if that sounds incredulous, I don't mean it that way, just wanting to be sure)

     - Sandro

>- Steve Speicher
>
>
>
>>
>> Thanks.
>>
>>       -- Sandro
>>
>>
>>  Just some feedback.
>>
>>  [1] - http://open-services.net/wiki/core/OSLC-Core-Partial-Update/
>>
>>  - Steve Speicher
>>
>>
>> On Sat, Sep 14, 2013 at 9:40 PM, Sandro Hawke <sandro@w3.org> wrote:
>>
>>> There have been some good emails on public-ldp-patch, and there was
>some
>>> good discussion at F2F4.   Here's where I think we are.   I don't
>know of
>>> anything in this email that anyone would disagree with (that is, I'm
>trying
>>> to summarize consensus), and I end with a suggested path forward.
>>>
>>> I think the biggest challenge we face -- and the challenge that
>divided
>>> me and Eric at the meeting -- is how to patch triples that involve
>blank
>>> nodes.   There seem to be two approaches:
>>>
>>> 1.  Require the client to create a graph pattern (a "where clause")
>which
>>> unambiguously identifies the blank nodes involved in the triples to
>be
>>> updated, and require the server to use that graph pattern to find
>those
>>> blank nodes in the graph being patched.
>>>
>>> 2.  Require that during the conversation that ends up involving
>patching,
>>> both parties use the same mapping from blank node labels to blank
>nodes.
>>>
>>> Option 1 is a good fit for SPARQL.   SPARQL servers naturally do
>that
>>> graph matching.  In contract, standard SPARQL servers don't have any
>way to
>>> share blank node scope as required for option 2. That kind of
>exposure of
>>> blank node labels has traditionally been avoided in the design of
>RDF
>>> systems.
>>>
>>> However, the worst-case performance with option 1 is exponential. If
>a
>>> triple to be updated is in the middle of a large cloud of blank
>nodes, then
>>> matching the where-clause might not be possible before we all die of
>old
>>> age.  (It's an extremely well studied problem in computer science;
>I'm not
>>> an expert, but I think I'm reading the results correctly.)
>>>
>>> No one has offered data about how often this worst-case behavior
>might be
>>> a problem in practice.  Arguably we're still in the early days, so
>it's too
>>> soon to know how painful this restriction might turn out to be.
>>>
>>> Some people said that the server can just set a time limit and
>reject
>>> patches that end up taking too long.   Other people (me) replied
>that makes
>>> the overall system too unpredictable, that systems should be able to
>send
>>> patches with confidence, especially one server to another.  As I
>said at
>>> the meeting, I don't know if this worst-case performance will turn
>out to
>>> be a problem, but I'm concerned enough about it that I can't +1
>option 1,
>>> and don't want my name on a spec based on it.  David reported at the
>>> meeting that Google's internal culture generally forbids using
>exponential
>>> algorithms, so we might expect if they were in the group they would
>>> formally object to option 1 (or just decide to never use it, which
>amounts
>>> to the same thing).  Our anecdotal reports that they don't use
>SPARQL
>>> support this hearsay, but as long is it remains hearsay, we probably
>>> shouldn't take it too seriously.
>>>
>>> Which brings me to the proposal.
>>>
>>> Let's move forward with both Option 1 *and* Option 2, marking them
>both
>>> "at risk" in the spec.   That gives us the whole Last Call and
>Candidate
>>> Recommendation periods to gather input on how bad the exponential
>>> performance issue is for Option 1 and how bad the implementation
>challenge
>>> is for Option 2 (how hard it is to get RDF systems to share scope in
>blank
>>> node labels).
>>>
>>> Then at the end of CR, we can decide if either of them is good
>enough to
>>> normatively reference as the basic LDP patch format.   If they both
>end up
>>> implemented and with people liking them, then we just pick one, so
>the
>>> folks don't have to implement both going forward.    If neither of
>them is
>>> implemented and liked, then we're back to where we are today, with
>no
>>> standard patch format for LDP, but some more data on why it's hard.
>>>
>>> How's that sound?
>>>
>>> I imagine Option 1 would end up as some subset of SPARQL Update,
>like
>>> TurtlePatch  [1] plus variables or like Eric presented at the
>meeting.  I
>>> imagine for Option 2 we'd have something like Andy and Rob's
>RDFPatch [2]
>>> or my old GRUF [3] (which I'd forgotten about until reading
>RDFPatch).
>>>
>>>     -- Sandro
>>>
>>> [1]  http://www.w3.org/2001/sw/wiki/TurtlePatch
>>> [2]  http://afs.github.io/rdf-patch
>>> [3]  http://websub.org/wiki/GRUF (from Apr 2010)
>>>
>>>
>>>
>>>
>>
>>

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Received on Wednesday, 2 October 2013 20:51:56 UTC