- From: Sandro Hawke <sandro@w3.org>
- Date: Sun, 08 Sep 2013 12:00:10 -0400
- To: Andy Seaborne <andy@apache.org>
- CC: public-ldp-patch@w3.org
On 09/08/2013 06:33 AM, Andy Seaborne wrote:
> On 08/09/13 04:17, Sandro Hawke wrote:
>> In theory, patching a large branching subgraph of blank nodes is
>> intractable. It's the graph isomorphism problem, which has
>> complexity NP.
>>
>> I think rdf lists don't have this problem, at least with a clever
>> implementation, because the subgraph of blank nodes doesn't
>> significantly branch.
>>
>> I wonder if there are many RDF graphs that do have this problem. I can
>> easily imagine paths which are 2-4 blank nodes deep, coming out of
>> record structures that are mapped to RDF, but even an exponential search
>> is okay when the exponent is only 4. The question is: are there, in
>> practice, subgraphs where you have to traverse 20 or 30 blank nodes,
>> with branching, to get to the part you need to alter? If we don't know
>> of any, how confident are we there won't be any real need for them?
>>
>> Also, do SPARQL systems handle rdf:lists efficiently enough? Can I
>> append to an rdf:List like this:
>>
>> DELETE { ?last rdf:rest rdf:nil }
>> INSERT { ?last rdf:rest ( :NewLastElement ) }
>> WHERE { :myList :myProp ?n1. ?n1 rdf:rest ?n2. ?n2 rdf:rest ?n3.
>> ..... rdf:rest ?last }
>
> rdf:rest*
> Arbitrary length path.
>
> DELETE { ?last rdf:rest rdf:nil }
> INSERT { ?last rdf:rest ( :NewLastElement ) }
> WHERE { :myList :myProp ?list .
> ?list rdf:rest* ?last .
> ?last rdf:rest rdf:nil }
>
Sorry, I should have disallowed that shortcut. Please consider my
question to be about inserting an element n items into a list that's
much longer. Also, assume the items in the list are all the same, so
you can't use value-matching to find the desired spot in the list.
So there really would have to be n variables. Do you know how typical
SPARQL systems will do with thousands of variables like that?
>> where the chain is 20 or 200 or 2000 or even 20,000 elements long and
>> have the the system handle it comfortably?
>
> It has to run down the list if all you give it is the head, but it's a
> loop, not storing and growing internal state as it goes.
>
>> If the answers to the above questions are not good,
>
> I think it is also hard to understand. The app developer has been
> dropped into the triple representation of a list when the date may
> have shown ( ... ). It's data assembler.
>
Indeed -- if this is part of a system, it would have to be buried where
most developers wouldn't know about it.
>> then we might want a
>> patch format that sidesteps the whole problem, for instance by having a
>> limited way to refer to blank nodes in the protocol. For example,
>> patches are often written relative to a particular resource
>> representation, pointed to by an ETag. I don't see any technical
>> reason the patch couldn't say "I'm using the same blank-node-names as
>> that representation does", and if the server cares enough to retain that
>> mapping, it can do patch processing in linear time.
>
> I agree that dropping down to the label is a good idea and necessary
> if we don't restrict the styles of RDF that can be effectively managed
> with a patch.
>
*nod*
Perhaps we should strawpoll on these four possible requirements. I
believe they're all easy to do if we use labels and 2, 3, and 4 are
impossible to do if we do not.
1. It must be possible to write a patch to go from any graph G1 to any
other graph G2
2. It must be possible to write a tractable (PTIME and PSPACE) patch to
go from any graph G1 to any other graph G2
3. Processing all patches must be tractable. (That is, not only must
it be possible to write a tractable patch, but it must be impossible to
write an intractable one.)
4. Creating a patch to go from any G1 to any G2 must itself be a
tractable problem.
> It has to be the system's choice of internal label:
>
But which system - the patch creator or the patch receiver?
> 1/ you can't use the label as given by syntax because it may be
> repeated in unrelated updates.
>
> Update 1: add triples:
>
> <#peter> foaf:knows _:a .
> _:a foaf:name "Paul" .
>
> Update 2: add triples:
>
> <#peter> foaf:knows _:a .
> _:a foaf:name "Patricia" .
>
> I can imagine template-driven updates doing that.
>
> 2/ In Turtle the system generates them for [] and ()
>
>
> The question is how to find the internal label.
>
> SPARQL's STR(?blank) could be used -- in the base spec says it's undef
> (=error, and so extensible) to call it on a blank node. Or
> IRI(?bnode) could be overloaded for skolemization (several systems
> already do variations on this).
>
Yeah, I think we need to move up from the language to the protocol for
this, and think about what both ends of the conversation are doing.
Here's a protocol that I think works:
1. Alice publishes version 1 of an n-triples file at URL U1, with ETag
"v1".
2. Bob creates a patch against U1 v1. He uses the same blank node
labels that Alice used, for the blank nodes she had. For new blank
nodes, he makes up new labels that Alice has not used in v1. His patch
basically consists of two n-triples strings, one with all the triples to
be deleted, one with all the triples to be added, and he specifies it's
a patch against v1.
3. Alice receives and applies the patch, giving the result ETag
"vBob1". She only needs to tell Bob that she has accepted the patch
and given it ETag "vBob1" for them to maintain their shared state.
4. If Alice gets a concurrent patch from Charlie (that is, another
patch against v1), and she can determine it does not conflict with Bob's
patch, she can confirm it to Charlie (creating vCharlie1), then send
each of them patches to bring them both up to vBob1Charlie1 (or v2, or
whatever - it's probably really named by a hash). In the patches she
constructs for each of them, she will cause them to rename any of the
blank nodes they created whose names happen to conflict with each other.
There are a few painful things about this protocol, though, that perhaps
we can improve.
1. It's annoying to have to use n-triples. We could move up to Turtle
while banning () and []. Other RDF syntaxes would also be fine, as long
as we disallow the constructs which generate blank nodes.
2. It's annoying to not have () and [] in step 1. We could allow them
if we had a standard naming for the blank nodes they generate. I think
we could do that although it would be a bit tricky - I think top-down
and bottom-up parsers would want to number them differently. Maybe we
could just say they're named _:gN where N is a decimal number, starting
from 1, incremented each time a blank node is implied, moving throught
the file in order. (Test case: "<a> <b> (1 ( 2 [<c> [<d> 3], [<e> 4]]
5) (6))" or something.) I imagine Bob would use a special Turtle (or
whatever) parser which followed this rule, knowing he was going to be
doing patching.
3. It's annoying to not have () and [] in step 2. This is different
from in step one, because (1) we can't just number from 1, or we'll be
conflicting with all of Alice' names, and (2) it's a patch syntax not
(necessarily) an RDF syntax. Perhaps what's needed is some directive
in the language or in the metadata which specifies a numeric starting
point for numbering these generated blank node lables.
4. The renaming in Step 4 could be painful. This could be addressed
by having easy renaming in the patch language. That is, not just "add
triple <s> <p> <o>" and "delete triples <s> <p> <o>", but also "rename
node <old> <new>". Actually, if we're doing this stuff with numbering
new blank nodes, and lists, we'd probably want "rename blank node range
<old-start> <old-end> <new-start>". So if Bob used blank node numbers
51-205, and charlie used blank node numbers 51-70, then Alice might
decide to renumber Charlies blank nodes in constructing vBobCharlie1.
Then in giving Charlie the patch from vCharlie1 to vBobCharlie1 she'd
include renumberBlanks(51,70,206).
This is starting to look like the start of a reasonable system. :-)
I should point out that in step 4 I brought in something we've never
talked about -- that it's important for LDP servers to be able to give
patches to LDP clients, as well. Just like a PUT is a very heavy way
to change a triple, a GET is a very heavy way to find out a triple has
changed. So I think we need to expect servers to provide streams of
patches to clients.
In the LDP style, one way to do this would be metadata on U1, either
embedded or in HTTP Link headers, saying { U1 ldp:patchToCurrent P1 },
and then one can GET P1 will return a patch from U1 to the current
version. (I'd also like ldp:patchToNext which gives a URL for a
longpoll URL, for a patch to whatever the next version is.)
-- Sandro
> Andy
>
>>
>> -- Sandro
>>
>>
>
>
>
Received on Sunday, 8 September 2013 16:00:17 UTC