- From: Sandro Hawke <sandro@w3.org>
- Date: Sun, 08 Sep 2013 12:00:10 -0400
- To: Andy Seaborne <andy@apache.org>
- CC: public-ldp-patch@w3.org
On 09/08/2013 06:33 AM, Andy Seaborne wrote: > On 08/09/13 04:17, Sandro Hawke wrote: >> In theory, patching a large branching subgraph of blank nodes is >> intractable. It's the graph isomorphism problem, which has >> complexity NP. >> >> I think rdf lists don't have this problem, at least with a clever >> implementation, because the subgraph of blank nodes doesn't >> significantly branch. >> >> I wonder if there are many RDF graphs that do have this problem. I can >> easily imagine paths which are 2-4 blank nodes deep, coming out of >> record structures that are mapped to RDF, but even an exponential search >> is okay when the exponent is only 4. The question is: are there, in >> practice, subgraphs where you have to traverse 20 or 30 blank nodes, >> with branching, to get to the part you need to alter? If we don't know >> of any, how confident are we there won't be any real need for them? >> >> Also, do SPARQL systems handle rdf:lists efficiently enough? Can I >> append to an rdf:List like this: >> >> DELETE { ?last rdf:rest rdf:nil } >> INSERT { ?last rdf:rest ( :NewLastElement ) } >> WHERE { :myList :myProp ?n1. ?n1 rdf:rest ?n2. ?n2 rdf:rest ?n3. >> ..... rdf:rest ?last } > > rdf:rest* > Arbitrary length path. > > DELETE { ?last rdf:rest rdf:nil } > INSERT { ?last rdf:rest ( :NewLastElement ) } > WHERE { :myList :myProp ?list . > ?list rdf:rest* ?last . > ?last rdf:rest rdf:nil } > Sorry, I should have disallowed that shortcut. Please consider my question to be about inserting an element n items into a list that's much longer. Also, assume the items in the list are all the same, so you can't use value-matching to find the desired spot in the list. So there really would have to be n variables. Do you know how typical SPARQL systems will do with thousands of variables like that? >> where the chain is 20 or 200 or 2000 or even 20,000 elements long and >> have the the system handle it comfortably? > > It has to run down the list if all you give it is the head, but it's a > loop, not storing and growing internal state as it goes. > >> If the answers to the above questions are not good, > > I think it is also hard to understand. The app developer has been > dropped into the triple representation of a list when the date may > have shown ( ... ). It's data assembler. > Indeed -- if this is part of a system, it would have to be buried where most developers wouldn't know about it. >> then we might want a >> patch format that sidesteps the whole problem, for instance by having a >> limited way to refer to blank nodes in the protocol. For example, >> patches are often written relative to a particular resource >> representation, pointed to by an ETag. I don't see any technical >> reason the patch couldn't say "I'm using the same blank-node-names as >> that representation does", and if the server cares enough to retain that >> mapping, it can do patch processing in linear time. > > I agree that dropping down to the label is a good idea and necessary > if we don't restrict the styles of RDF that can be effectively managed > with a patch. > *nod* Perhaps we should strawpoll on these four possible requirements. I believe they're all easy to do if we use labels and 2, 3, and 4 are impossible to do if we do not. 1. It must be possible to write a patch to go from any graph G1 to any other graph G2 2. It must be possible to write a tractable (PTIME and PSPACE) patch to go from any graph G1 to any other graph G2 3. Processing all patches must be tractable. (That is, not only must it be possible to write a tractable patch, but it must be impossible to write an intractable one.) 4. Creating a patch to go from any G1 to any G2 must itself be a tractable problem. > It has to be the system's choice of internal label: > But which system - the patch creator or the patch receiver? > 1/ you can't use the label as given by syntax because it may be > repeated in unrelated updates. > > Update 1: add triples: > > <#peter> foaf:knows _:a . > _:a foaf:name "Paul" . > > Update 2: add triples: > > <#peter> foaf:knows _:a . > _:a foaf:name "Patricia" . > > I can imagine template-driven updates doing that. > > 2/ In Turtle the system generates them for [] and () > > > The question is how to find the internal label. > > SPARQL's STR(?blank) could be used -- in the base spec says it's undef > (=error, and so extensible) to call it on a blank node. Or > IRI(?bnode) could be overloaded for skolemization (several systems > already do variations on this). > Yeah, I think we need to move up from the language to the protocol for this, and think about what both ends of the conversation are doing. Here's a protocol that I think works: 1. Alice publishes version 1 of an n-triples file at URL U1, with ETag "v1". 2. Bob creates a patch against U1 v1. He uses the same blank node labels that Alice used, for the blank nodes she had. For new blank nodes, he makes up new labels that Alice has not used in v1. His patch basically consists of two n-triples strings, one with all the triples to be deleted, one with all the triples to be added, and he specifies it's a patch against v1. 3. Alice receives and applies the patch, giving the result ETag "vBob1". She only needs to tell Bob that she has accepted the patch and given it ETag "vBob1" for them to maintain their shared state. 4. If Alice gets a concurrent patch from Charlie (that is, another patch against v1), and she can determine it does not conflict with Bob's patch, she can confirm it to Charlie (creating vCharlie1), then send each of them patches to bring them both up to vBob1Charlie1 (or v2, or whatever - it's probably really named by a hash). In the patches she constructs for each of them, she will cause them to rename any of the blank nodes they created whose names happen to conflict with each other. There are a few painful things about this protocol, though, that perhaps we can improve. 1. It's annoying to have to use n-triples. We could move up to Turtle while banning () and []. Other RDF syntaxes would also be fine, as long as we disallow the constructs which generate blank nodes. 2. It's annoying to not have () and [] in step 1. We could allow them if we had a standard naming for the blank nodes they generate. I think we could do that although it would be a bit tricky - I think top-down and bottom-up parsers would want to number them differently. Maybe we could just say they're named _:gN where N is a decimal number, starting from 1, incremented each time a blank node is implied, moving throught the file in order. (Test case: "<a> <b> (1 ( 2 [<c> [<d> 3], [<e> 4]] 5) (6))" or something.) I imagine Bob would use a special Turtle (or whatever) parser which followed this rule, knowing he was going to be doing patching. 3. It's annoying to not have () and [] in step 2. This is different from in step one, because (1) we can't just number from 1, or we'll be conflicting with all of Alice' names, and (2) it's a patch syntax not (necessarily) an RDF syntax. Perhaps what's needed is some directive in the language or in the metadata which specifies a numeric starting point for numbering these generated blank node lables. 4. The renaming in Step 4 could be painful. This could be addressed by having easy renaming in the patch language. That is, not just "add triple <s> <p> <o>" and "delete triples <s> <p> <o>", but also "rename node <old> <new>". Actually, if we're doing this stuff with numbering new blank nodes, and lists, we'd probably want "rename blank node range <old-start> <old-end> <new-start>". So if Bob used blank node numbers 51-205, and charlie used blank node numbers 51-70, then Alice might decide to renumber Charlies blank nodes in constructing vBobCharlie1. Then in giving Charlie the patch from vCharlie1 to vBobCharlie1 she'd include renumberBlanks(51,70,206). This is starting to look like the start of a reasonable system. :-) I should point out that in step 4 I brought in something we've never talked about -- that it's important for LDP servers to be able to give patches to LDP clients, as well. Just like a PUT is a very heavy way to change a triple, a GET is a very heavy way to find out a triple has changed. So I think we need to expect servers to provide streams of patches to clients. In the LDP style, one way to do this would be metadata on U1, either embedded or in HTTP Link headers, saying { U1 ldp:patchToCurrent P1 }, and then one can GET P1 will return a patch from U1 to the current version. (I'd also like ldp:patchToNext which gives a URL for a longpoll URL, for a patch to whatever the next version is.) -- Sandro > Andy > >> >> -- Sandro >> >> > > >
Received on Sunday, 8 September 2013 16:00:17 UTC