Re: [TF-PP] Moving forward from Andy Seaborne on 2010-06-15 (public-rdf-dawg@w3.org from April to June 2010)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Tue, 15 Jun 2010 19:18:34 +0100
To: matthew.perry@oracle.com
CC: public-rdf-dawg@w3.org
Message-ID: <4C17C3FA.5010307@talis.com>
On 15/06/2010 1:46 PM, Matt Perry wrote:
> Andy Seaborne wrote:
>> Matt,
>>
>> When we had:
>> [[
>> PROPOSED: The cardinality of solutions to fixed-length paths
>> is the same as the cardinality of solutions to the path expanded into
>> triple patterns (with all variables projected); the cardinality of
>> solutions to variable-length paths is the cardinality of solutions
>> via paths that do not repeat nodes; the cardinality of solutions to
>> paths combining fixed and variable length (elt{n,} ) is a combination
>> of the fixed definition plus the variable definition for paths longer
>> than the fixed length.
>> ]]
>>
>> it looked reasonable at the time but I'm finding that it isn't do
>> clear cut a distinction of variable and fixed length paths.
>>
>> :p{n}
>> which is similar or the same as p{n,n}
>> which is similar or the same as :p/:p/..../:p
>> :p{n,m}
>> using the text above it is :p{n} / :p{0,(m-n)}
>> which is similar or the same as :p/:p/:p?/:p?
>> (n times :p and (m-n :p?)
>> :p+
>> which is also :p{1,}
>>
>> Birte suggests that
>>
>> :p{n,m} is
>> :p{n} UNION :p{n+1} ... UNION :p{m}
>>
>> although this is problematic because it goes round loops multiple
>> times if fixed length paths are the same as their triple pattern
>> expansion.
>
> You make a good point about allowing arbitrary repetition of loops when
> evaluating the triple pattern expansion of a fixed length property path.
> Because all of our proposals prevent repetition of loops when evaluating
> unbounded property paths, we will always have some awkwardness in corner
> cases. Take the following example:
>
> :a :p :c
> :c :p :c
> :c :p :z
>
> { :a :p{4} ?y } finds the path :a -:p-> :c -:p-> :c -:p-> :c -:p-> :z,
> but { :a :p{3,} ?y } misses this path.
>
>
>>
>> :p{1,} needs to have a relationship to :p{1,X} as X tends to infinity.
>> When X is larger than the longest path in the graph, it's going to odd
>> (but possible) to have :p+ and :p{1,X} returning different answers.
>>
>> I can see that :p{n,m} defined as a union for small n and large m is
>> going to an implementation burden for some systems using a pure
>> expansion.
>>
>> The closest consistent view point with the text above on cardinality
>> seems to me, currently, to be the form that counts all paths,
>> including once round loops, which is the triple matching and
>> backtracking. Intuitively, it's like trying to create triple
>> expansions but limiting loops to once round. It feels as if it is
>> consistent possible future returning path lengths.
>
> I would argue that node marking with backtracking also makes sense and
> is much easier to implement than edge marking with backtracking. Node
> marking is easier to implement because when we encounter two nodes with
> the same URI, we know that they are the same node, but when we encounter
> two edges with the same URI, we can't say anything about their equality.
> This fact makes cycle detection based on node repetition easier in my
> opinion, especially when using disk-based storage and joining triples
> tables.

Minor: edge equality is possible: if a triple has same subject predicate 
and object, it's the same triple because an RDF graph is a set of triples.

> As an example of node marking with backtracking, consider the following
> dataset and query.
>
> :a :p :b
> :b :p :z
> :b :p :c
> :a :p :c
> :c :p :c
> :c :p :z
>
> { :a :p+ ?y }
>
> With node marking with backtracking, we get:
> :a -:p-> :b
> :a -:p-> :b -:p-> :z
> :a -:p-> :b -:p-> :c
> :a -:p-> :b -:p-> :c -:p-> :z
> :a -:p-> :c
> :a -:p-> :c -:p-> :z
>
> This is exactly all simple paths that match the path expression and also
> what we would get by doing a recursive SQL query against an s,p,o
> triples table.

To make sure we understand each other: node marking with backtracking is 
determining a path such that it passes a node only once but the same 
node can be part of another path.

As the algorithm backtracks, it removes a record of nodes passed through 
from some working set (the visited set).  It never looks at a node if it 
finds it in the visited set.

A simple path.
http://en.wikipedia.org/wiki/Path_%28graph_theory%29

This is not finding distinct solutions.  Your example shows it finds 
simple paths :a to :z several times so the cardinality of :z would be 3.

(The distinct case is node marking and adding visited nodes to a set, 
not removing them from the set during traversal)

> With edge marking with backtracking, we get the following additional paths:
> :a -:p-> :b -:p-> :c -:p-> :c
> :a -:p-> :b -:p-> :c -:p-> :c -:p-> :z
> :a -:p-> :c -:p-> :c
> :a -:p-> :c -:p-> :c -:p-> :z

Determining cardinality by simple path should make extension to more 
complex path expressions easier although there is some checking to do 
here.  Does the entire segment 'path' of

:A (path)* :B
e.g.
:A (:p1|:p2)* :B

need to be repeated? I think the rule of simple path determination 
extends to this case more easily because the edges are not just fixed 
constant terms any more.

Birte - you suggested edge marking.  How does node marking with 
backtracking and unwinding, that is "simple paths", work for you?

 Andy
Received on Tuesday, 15 June 2010 18:24:37 UTC