- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 02 Oct 2013 16:51:57 -0400
- To: Steve Speicher <sspeiche@gmail.com>
- CC: public-ldp-patch@w3.org,Eric Prud'hommeaux <eric@w3.org>,Tim Berners-Lee <timbl@w3.org>,Linked Data Platform WG <public-ldp-wg@w3.org>
Steve Speicher <sspeiche@gmail.com> wrote: >On Wed, Oct 2, 2013 at 1:01 PM, Sandro Hawke <sandro@w3.org> wrote: > >> On 10/02/2013 10:54 AM, Steve Speicher wrote: >> >> Sandro, >> >> My typical resource graphs and patch scenarios have led me to an >> approach [1] somewhat similar to option #1. >> My approach [1] is to follow a very simple model such as: >> a) here are the triples to remove from the graph (exactly, no >dependency >> on blank node labels) >> >> >> So your data has no blank nodes, right? >> > >No, it has some blank nodes but its usage is somewhat limited. Dare I >mention that our resources have some reification statements where we >just >key based on the reified statement to find the right triples to modify. > Is that keying done in an application specified way by the server, or is that indicated by the client? If it's the client, that means you have a WHERE clause. Do the variables in that match only blank nodes? What happens if they match multiple times? Are there limits to how complex the pattern is allowed to be? > >> >> >> b) here are the triples to add to the graph >> This seems to hit near 100% of my cases. To be clear, this has not >been >> widely deployed so the amount of cases and types of resources is >limited. >> >> After polling another team that is using the LDP approach, they in >fact >> don't support PUT for updating resources but PATCH only. In their >model, >> they reused an existing RDF format and defined some simple patterns >(such >> as a triple in the patch document that matches subject and predicate >with >> triples in the graph, remove those matched triples and replace with >new >> triple). This group doesn't use SPARQL but stores RDF data natively. > This >> team expressed some concern in library/tool generated PATCH documents >in >> SPARQL-like format, mostly founded on complexity of the format and >overhead >> of client libraries, along with potential errors. >> >> >> Is their data also free from blank nodes? >> > >It is not but they feel like it could easily support it. > They can support a WHERE clause with NP variable matching, with backtracking and all that, but don't want to parse SPARQL? (sorry if that sounds incredulous, I don't mean it that way, just wanting to be sure) - Sandro >- Steve Speicher > > > >> >> Thanks. >> >> -- Sandro >> >> >> Just some feedback. >> >> [1] - http://open-services.net/wiki/core/OSLC-Core-Partial-Update/ >> >> - Steve Speicher >> >> >> On Sat, Sep 14, 2013 at 9:40 PM, Sandro Hawke <sandro@w3.org> wrote: >> >>> There have been some good emails on public-ldp-patch, and there was >some >>> good discussion at F2F4. Here's where I think we are. I don't >know of >>> anything in this email that anyone would disagree with (that is, I'm >trying >>> to summarize consensus), and I end with a suggested path forward. >>> >>> I think the biggest challenge we face -- and the challenge that >divided >>> me and Eric at the meeting -- is how to patch triples that involve >blank >>> nodes. There seem to be two approaches: >>> >>> 1. Require the client to create a graph pattern (a "where clause") >which >>> unambiguously identifies the blank nodes involved in the triples to >be >>> updated, and require the server to use that graph pattern to find >those >>> blank nodes in the graph being patched. >>> >>> 2. Require that during the conversation that ends up involving >patching, >>> both parties use the same mapping from blank node labels to blank >nodes. >>> >>> Option 1 is a good fit for SPARQL. SPARQL servers naturally do >that >>> graph matching. In contract, standard SPARQL servers don't have any >way to >>> share blank node scope as required for option 2. That kind of >exposure of >>> blank node labels has traditionally been avoided in the design of >RDF >>> systems. >>> >>> However, the worst-case performance with option 1 is exponential. If >a >>> triple to be updated is in the middle of a large cloud of blank >nodes, then >>> matching the where-clause might not be possible before we all die of >old >>> age. (It's an extremely well studied problem in computer science; >I'm not >>> an expert, but I think I'm reading the results correctly.) >>> >>> No one has offered data about how often this worst-case behavior >might be >>> a problem in practice. Arguably we're still in the early days, so >it's too >>> soon to know how painful this restriction might turn out to be. >>> >>> Some people said that the server can just set a time limit and >reject >>> patches that end up taking too long. Other people (me) replied >that makes >>> the overall system too unpredictable, that systems should be able to >send >>> patches with confidence, especially one server to another. As I >said at >>> the meeting, I don't know if this worst-case performance will turn >out to >>> be a problem, but I'm concerned enough about it that I can't +1 >option 1, >>> and don't want my name on a spec based on it. David reported at the >>> meeting that Google's internal culture generally forbids using >exponential >>> algorithms, so we might expect if they were in the group they would >>> formally object to option 1 (or just decide to never use it, which >amounts >>> to the same thing). Our anecdotal reports that they don't use >SPARQL >>> support this hearsay, but as long is it remains hearsay, we probably >>> shouldn't take it too seriously. >>> >>> Which brings me to the proposal. >>> >>> Let's move forward with both Option 1 *and* Option 2, marking them >both >>> "at risk" in the spec. That gives us the whole Last Call and >Candidate >>> Recommendation periods to gather input on how bad the exponential >>> performance issue is for Option 1 and how bad the implementation >challenge >>> is for Option 2 (how hard it is to get RDF systems to share scope in >blank >>> node labels). >>> >>> Then at the end of CR, we can decide if either of them is good >enough to >>> normatively reference as the basic LDP patch format. If they both >end up >>> implemented and with people liking them, then we just pick one, so >the >>> folks don't have to implement both going forward. If neither of >them is >>> implemented and liked, then we're back to where we are today, with >no >>> standard patch format for LDP, but some more data on why it's hard. >>> >>> How's that sound? >>> >>> I imagine Option 1 would end up as some subset of SPARQL Update, >like >>> TurtlePatch [1] plus variables or like Eric presented at the >meeting. I >>> imagine for Option 2 we'd have something like Andy and Rob's >RDFPatch [2] >>> or my old GRUF [3] (which I'd forgotten about until reading >RDFPatch). >>> >>> -- Sandro >>> >>> [1] http://www.w3.org/2001/sw/wiki/TurtlePatch >>> [2] http://afs.github.io/rdf-patch >>> [3] http://websub.org/wiki/GRUF (from Apr 2010) >>> >>> >>> >>> >> >> -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Received on Wednesday, 2 October 2013 20:51:56 UTC