Re: [TF-PP] Moving forward from Birte Glimm on 2010-06-23 (public-rdf-dawg@w3.org from April to June 2010)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Wed, 23 Jun 2010 09:20:02 +0100
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: "matthew.perry@oracle.com" <matthew.perry@oracle.com>, "public-rdf-dawg@w3.org" <public-rdf-dawg@w3.org>
Message-Id: <ED968A61-8BD7-42F1-B3A2-EAE31B092C50@gmail.com>
On 15 Jun 2010, at 19:18, Andy Seaborne <andy.seaborne@talis.com> wrote:

> 
> 
> On 15/06/2010 1:46 PM, Matt Perry wrote:
>> Andy Seaborne wrote:
>>> Matt,
>>> 
>>> When we had:
>>> [[
>>> PROPOSED: The cardinality of solutions to fixed-length paths
>>> is the same as the cardinality of solutions to the path expanded into
>>> triple patterns (with all variables projected); the cardinality of
>>> solutions to variable-length paths is the cardinality of solutions
>>> via paths that do not repeat nodes; the cardinality of solutions to
>>> paths combining fixed and variable length (elt{n,} ) is a combination
>>> of the fixed definition plus the variable definition for paths longer
>>> than the fixed length.
>>> ]]
>>> 
>>> it looked reasonable at the time but I'm finding that it isn't do
>>> clear cut a distinction of variable and fixed length paths.
>>> 
>>> :p{n}
>>> which is similar or the same as p{n,n}
>>> which is similar or the same as :p/:p/..../:p
>>> :p{n,m}
>>> using the text above it is :p{n} / :p{0,(m-n)}
>>> which is similar or the same as :p/:p/:p?/:p?
>>> (n times :p and (m-n :p?)
>>> :p+
>>> which is also :p{1,}
>>> 
>>> Birte suggests that
>>> 
>>> :p{n,m} is
>>> :p{n} UNION :p{n+1} ... UNION :p{m}
>>> 
>>> although this is problematic because it goes round loops multiple
>>> times if fixed length paths are the same as their triple pattern
>>> expansion.
>> 
>> You make a good point about allowing arbitrary repetition of loops when
>> evaluating the triple pattern expansion of a fixed length property path.
>> Because all of our proposals prevent repetition of loops when evaluating
>> unbounded property paths, we will always have some awkwardness in corner
>> cases. Take the following example:
>> 
>> :a :p :c
>> :c :p :c
>> :c :p :z
>> 
>> { :a :p{4} ?y } finds the path :a -:p-> :c -:p-> :c -:p-> :c -:p-> :z,
>> but { :a :p{3,} ?y } misses this path.
>> 
>> 
>>> 
>>> :p{1,} needs to have a relationship to :p{1,X} as X tends to infinity.
>>> When X is larger than the longest path in the graph, it's going to odd
>>> (but possible) to have :p+ and :p{1,X} returning different answers.
>>> 
>>> I can see that :p{n,m} defined as a union for small n and large m is
>>> going to an implementation burden for some systems using a pure
>>> expansion.
>>> 
>>> The closest consistent view point with the text above on cardinality
>>> seems to me, currently, to be the form that counts all paths,
>>> including once round loops, which is the triple matching and
>>> backtracking. Intuitively, it's like trying to create triple
>>> expansions but limiting loops to once round. It feels as if it is
>>> consistent possible future returning path lengths.
>> 
>> I would argue that node marking with backtracking also makes sense and
>> is much easier to implement than edge marking with backtracking. Node
>> marking is easier to implement because when we encounter two nodes with
>> the same URI, we know that they are the same node, but when we encounter
>> two edges with the same URI, we can't say anything about their equality.
>> This fact makes cycle detection based on node repetition easier in my
>> opinion, especially when using disk-based storage and joining triples
>> tables.
> 
> Minor: edge equality is possible: if a triple has same subject predicate and object, it's the same triple because an RDF graph is a set of triples.
> 
>> As an example of node marking with backtracking, consider the following
>> dataset and query.
>> 
>> :a :p :b
>> :b :p :z
>> :b :p :c
>> :a :p :c
>> :c :p :c
>> :c :p :z
>> 
>> { :a :p+ ?y }
>> 
>> With node marking with backtracking, we get:
>> :a -:p-> :b
>> :a -:p-> :b -:p-> :z
>> :a -:p-> :b -:p-> :c
>> :a -:p-> :b -:p-> :c -:p-> :z
>> :a -:p-> :c
>> :a -:p-> :c -:p-> :z
>> 
>> This is exactly all simple paths that match the path expression and also
>> what we would get by doing a recursive SQL query against an s,p,o
>> triples table.
> 
> To make sure we understand each other: node marking with backtracking is determining a path such that it passes a node only once but the same node can be part of another path.
> 
> As the algorithm backtracks, it removes a record of nodes passed through from some working set (the visited set).  It never looks at a node if it finds it in the visited set.
> 
> A simple path.
> http://en.wikipedia.org/wiki/Path_%28graph_theory%29
> 
> This is not finding distinct solutions.  Your example shows it finds simple paths :a to :z several times so the cardinality of :z would be 3.
> 
> (The distinct case is node marking and adding visited nodes to a set, not removing them from the set during traversal)
> 
>> With edge marking with backtracking, we get the following additional paths:
>> :a -:p-> :b -:p-> :c -:p-> :c
>> :a -:p-> :b -:p-> :c -:p-> :c -:p-> :z
>> :a -:p-> :c -:p-> :c
>> :a -:p-> :c -:p-> :c -:p-> :z
> 
> Determining cardinality by simple path should make extension to more complex path expressions easier although there is some checking to do here.  Does the entire segment 'path' of
> 
> :A (path)* :B
> e.g.
> :A (:p1|:p2)* :B
> 
> need to be repeated? I think the rule of simple path determination extends to this case more easily because the edges are not just fixed constant terms any more.
> 
> Birte - you suggested edge marking.  How does node marking with backtracking and unwinding, that is "simple paths", work for you?
> 
I am not insisting on edge marking. If node marking fits the overall picture better, that's fine. 
Birte

>    Andy
> 
>
Received on Wednesday, 23 June 2010 08:21:03 UTC