W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > April to June 2010

Re: [TF-PP] Moving forward

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Tue, 15 Jun 2010 19:18:34 +0100
Message-ID: <4C17C3FA.5010307@talis.com>
To: matthew.perry@oracle.com
CC: public-rdf-dawg@w3.org

On 15/06/2010 1:46 PM, Matt Perry wrote:
> Andy Seaborne wrote:
>> Matt,
>> When we had:
>> [[
>> PROPOSED: The cardinality of solutions to fixed-length paths
>> is the same as the cardinality of solutions to the path expanded into
>> triple patterns (with all variables projected); the cardinality of
>> solutions to variable-length paths is the cardinality of solutions
>> via paths that do not repeat nodes; the cardinality of solutions to
>> paths combining fixed and variable length (elt{n,} ) is a combination
>> of the fixed definition plus the variable definition for paths longer
>> than the fixed length.
>> ]]
>> it looked reasonable at the time but I'm finding that it isn't do
>> clear cut a distinction of variable and fixed length paths.
>> :p{n}
>> which is similar or the same as p{n,n}
>> which is similar or the same as :p/:p/..../:p
>> :p{n,m}
>> using the text above it is :p{n} / :p{0,(m-n)}
>> which is similar or the same as :p/:p/:p?/:p?
>> (n times :p and (m-n :p?)
>> :p+
>> which is also :p{1,}
>> Birte suggests that
>> :p{n,m} is
>> :p{n} UNION :p{n+1} ... UNION :p{m}
>> although this is problematic because it goes round loops multiple
>> times if fixed length paths are the same as their triple pattern
>> expansion.
> You make a good point about allowing arbitrary repetition of loops when
> evaluating the triple pattern expansion of a fixed length property path.
> Because all of our proposals prevent repetition of loops when evaluating
> unbounded property paths, we will always have some awkwardness in corner
> cases. Take the following example:
> :a :p :c
> :c :p :c
> :c :p :z
> { :a :p{4} ?y } finds the path :a -:p-> :c -:p-> :c -:p-> :c -:p-> :z,
> but { :a :p{3,} ?y } misses this path.
>> :p{1,} needs to have a relationship to :p{1,X} as X tends to infinity.
>> When X is larger than the longest path in the graph, it's going to odd
>> (but possible) to have :p+ and :p{1,X} returning different answers.
>> I can see that :p{n,m} defined as a union for small n and large m is
>> going to an implementation burden for some systems using a pure
>> expansion.
>> The closest consistent view point with the text above on cardinality
>> seems to me, currently, to be the form that counts all paths,
>> including once round loops, which is the triple matching and
>> backtracking. Intuitively, it's like trying to create triple
>> expansions but limiting loops to once round. It feels as if it is
>> consistent possible future returning path lengths.
> I would argue that node marking with backtracking also makes sense and
> is much easier to implement than edge marking with backtracking. Node
> marking is easier to implement because when we encounter two nodes with
> the same URI, we know that they are the same node, but when we encounter
> two edges with the same URI, we can't say anything about their equality.
> This fact makes cycle detection based on node repetition easier in my
> opinion, especially when using disk-based storage and joining triples
> tables.

Minor: edge equality is possible: if a triple has same subject predicate 
and object, it's the same triple because an RDF graph is a set of triples.

> As an example of node marking with backtracking, consider the following
> dataset and query.
> :a :p :b
> :b :p :z
> :b :p :c
> :a :p :c
> :c :p :c
> :c :p :z
> { :a :p+ ?y }
> With node marking with backtracking, we get:
> :a -:p-> :b
> :a -:p-> :b -:p-> :z
> :a -:p-> :b -:p-> :c
> :a -:p-> :b -:p-> :c -:p-> :z
> :a -:p-> :c
> :a -:p-> :c -:p-> :z
> This is exactly all simple paths that match the path expression and also
> what we would get by doing a recursive SQL query against an s,p,o
> triples table.

To make sure we understand each other: node marking with backtracking is 
determining a path such that it passes a node only once but the same 
node can be part of another path.

As the algorithm backtracks, it removes a record of nodes passed through 
from some working set (the visited set).  It never looks at a node if it 
finds it in the visited set.

A simple path.

This is not finding distinct solutions.  Your example shows it finds 
simple paths :a to :z several times so the cardinality of :z would be 3.

(The distinct case is node marking and adding visited nodes to a set, 
not removing them from the set during traversal)

> With edge marking with backtracking, we get the following additional paths:
> :a -:p-> :b -:p-> :c -:p-> :c
> :a -:p-> :b -:p-> :c -:p-> :c -:p-> :z
> :a -:p-> :c -:p-> :c
> :a -:p-> :c -:p-> :c -:p-> :z

Determining cardinality by simple path should make extension to more 
complex path expressions easier although there is some checking to do 
here.  Does the entire segment 'path' of

:A (path)* :B
:A (:p1|:p2)* :B

need to be repeated? I think the rule of simple path determination 
extends to this case more easily because the edges are not just fixed 
constant terms any more.

Birte - you suggested edge marking.  How does node marking with 
backtracking and unwinding, that is "simple paths", work for you?

Received on Tuesday, 15 June 2010 18:24:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:01:00 UTC