Re: Patch Issue: Blank Nodes and Tractability

On 08/09/13 17:00, Sandro Hawke wrote:
> On 09/08/2013 06:33 AM, Andy Seaborne wrote:
>> On 08/09/13 04:17, Sandro Hawke wrote:
>>> In theory, patching a large branching subgraph of blank nodes is
>>> intractable.   It's the graph isomorphism problem, which has
>>> complexity NP.
>>>
>>> I think rdf lists don't have this problem, at least with a clever
>>> implementation, because the subgraph of blank nodes doesn't
>>> significantly branch.
>>>
>>> I wonder if there are many RDF graphs that do have this problem.   I can
>>> easily imagine paths which are 2-4 blank nodes deep, coming out of
>>> record structures that are mapped to RDF, but even an exponential search
>>> is okay when the exponent is only 4.  The question is: are there, in
>>> practice, subgraphs where you have to traverse 20 or 30 blank nodes,
>>> with branching, to get to the part you need to alter? If we don't know
>>> of any, how confident are we there won't be any real need for them?
>>>
>>> Also, do SPARQL systems handle rdf:lists efficiently enough? Can I
>>> append to an rdf:List like this:
>>>
>>> DELETE { ?last rdf:rest rdf:nil }
>>> INSERT { ?last rdf:rest ( :NewLastElement ) }
>>> WHERE  { :myList :myProp ?n1.  ?n1 rdf:rest ?n2.  ?n2 rdf:rest ?n3.
>>> .....    rdf:rest ?last }
>>
>> rdf:rest*
>>  Arbitrary length path.
>>
>> DELETE { ?last rdf:rest rdf:nil }
>> INSERT { ?last rdf:rest ( :NewLastElement ) }
>> WHERE  { :myList :myProp ?list .
>>          ?list rdf:rest* ?last .
>>          ?last rdf:rest rdf:nil }
>>
>
> Sorry, I should have disallowed that shortcut.     Please consider my
> question to be about inserting an element n items into a list that's
> much longer.   Also, assume the items in the list are all the same, so
> you can't use value-matching to find the desired spot in the list.

(so what is the desired location?!!!)

There are lots of cases of list manipulation you might want to do.  It 
is not pretty in SPARQL or anything else. The problem is the RDF encoding.

I had a go at some awhile ago:

http://seaborne.blogspot.co.uk/2011/03/updating-rdf-lists-with-sparql.html

None of this is pretty.

RDF needs first class lists, and sets, data values, not triple-encoded 
structures.  The encoding eventually shows through - if nothing else
  { ?s ?p ?o }.

Failing that, operators in SPARQL to work on well-formed lists and not 
mess up on ill-formed lists.

(ARQ has a few property functions to help, e.g. get the i'th element of 
a list but not list manipulation functions.

Failing even that, real bnode labels.

> So there really would have to be n variables.    Do you know how typical
> SPARQL systems will do with thousands of variables like that?

I have not seen a list of 20,000 items outside of a test suite.

An array of 20,000 items in an SQL list is a challenge as well.

>>> where the chain is 20 or 200 or 2000 or even 20,000 elements long and
>>> have the the system handle it comfortably?
>>
>> It has to run down the list if all you give it is the head, but it's a
>> loop, not storing and growing internal state as it goes.
>>
>>> If the answers to the above questions are not good,
>>
>> I think it is also hard to understand.  The app developer has been
>> dropped into the triple representation of a list when the date may
>> have shown ( ... ).  It's data assembler.
>>
>
> Indeed -- if this is part of a system, it would have to be buried where
> most developers wouldn't know about it.
>
>>> then we might want a
>>> patch format that sidesteps the whole problem, for instance by having a
>>> limited way to refer to blank nodes in the protocol. For example,
>>> patches are often written relative to a particular resource
>>> representation, pointed to by an ETag.   I don't see any technical
>>> reason the patch couldn't say "I'm using the same blank-node-names as
>>> that representation does", and if the server cares enough to retain that
>>> mapping, it can do patch processing in linear time.
>>
>> I agree that dropping down to the label is a good idea and necessary
>> if we don't restrict the styles of RDF that can be effectively managed
>> with a patch.
>>
>
> *nod*
>
> Perhaps we should strawpoll on these four possible requirements. I
> believe they're all easy to do if we use labels and 2, 3, and 4 are
> impossible to do if we do not.

There is also a dimension of whether this is for LDP or more general. 
LDP is about specific resources.

> 1. It must be possible to write a patch to go from any graph G1 to any
> other graph G2

This is actually very important.

This list name is prejudging that.  The per resource update is not 
general but it's a natural outcome of targeting LDP as David's 
description shows.

> 2. It must be possible to write a tractable (PTIME and PSPACE) patch to
> go from any graph G1 to any other graph G2
> 3. Processing all patches must be tractable.   (That is, not only must
> it be possible to write a tractable patch, but it must be impossible to
> write an intractable one.)
> 4. Creating a patch to go from any G1 to any G2 must itself be a
> tractable problem.
>
>
>> It has to be the system's choice of internal label:
>>
>
> But which system - the patch creator or the patch receiver?

Receiver.  The 2 different patch creators should not need to know about 
one another.  Think of an app that has a template of that RDF in it, 
that two people run it separately and maybe it propagates vai some async 
route.  Eventual consistency.

The server is the common point and adjudicates; RDF is defined bnode 
labels in syntax to be file scoped.

	Andy

Received on Sunday, 8 September 2013 17:40:30 UTC