- From: Andy Seaborne <andy.seaborne@talis.com>
- Date: Tue, 15 Jun 2010 19:18:34 +0100
- To: matthew.perry@oracle.com
- CC: public-rdf-dawg@w3.org
On 15/06/2010 1:46 PM, Matt Perry wrote: > Andy Seaborne wrote: >> Matt, >> >> When we had: >> [[ >> PROPOSED: The cardinality of solutions to fixed-length paths >> is the same as the cardinality of solutions to the path expanded into >> triple patterns (with all variables projected); the cardinality of >> solutions to variable-length paths is the cardinality of solutions >> via paths that do not repeat nodes; the cardinality of solutions to >> paths combining fixed and variable length (elt{n,} ) is a combination >> of the fixed definition plus the variable definition for paths longer >> than the fixed length. >> ]] >> >> it looked reasonable at the time but I'm finding that it isn't do >> clear cut a distinction of variable and fixed length paths. >> >> :p{n} >> which is similar or the same as p{n,n} >> which is similar or the same as :p/:p/..../:p >> :p{n,m} >> using the text above it is :p{n} / :p{0,(m-n)} >> which is similar or the same as :p/:p/:p?/:p? >> (n times :p and (m-n :p?) >> :p+ >> which is also :p{1,} >> >> Birte suggests that >> >> :p{n,m} is >> :p{n} UNION :p{n+1} ... UNION :p{m} >> >> although this is problematic because it goes round loops multiple >> times if fixed length paths are the same as their triple pattern >> expansion. > > You make a good point about allowing arbitrary repetition of loops when > evaluating the triple pattern expansion of a fixed length property path. > Because all of our proposals prevent repetition of loops when evaluating > unbounded property paths, we will always have some awkwardness in corner > cases. Take the following example: > > :a :p :c > :c :p :c > :c :p :z > > { :a :p{4} ?y } finds the path :a -:p-> :c -:p-> :c -:p-> :c -:p-> :z, > but { :a :p{3,} ?y } misses this path. > > >> >> :p{1,} needs to have a relationship to :p{1,X} as X tends to infinity. >> When X is larger than the longest path in the graph, it's going to odd >> (but possible) to have :p+ and :p{1,X} returning different answers. >> >> I can see that :p{n,m} defined as a union for small n and large m is >> going to an implementation burden for some systems using a pure >> expansion. >> >> The closest consistent view point with the text above on cardinality >> seems to me, currently, to be the form that counts all paths, >> including once round loops, which is the triple matching and >> backtracking. Intuitively, it's like trying to create triple >> expansions but limiting loops to once round. It feels as if it is >> consistent possible future returning path lengths. > > I would argue that node marking with backtracking also makes sense and > is much easier to implement than edge marking with backtracking. Node > marking is easier to implement because when we encounter two nodes with > the same URI, we know that they are the same node, but when we encounter > two edges with the same URI, we can't say anything about their equality. > This fact makes cycle detection based on node repetition easier in my > opinion, especially when using disk-based storage and joining triples > tables. Minor: edge equality is possible: if a triple has same subject predicate and object, it's the same triple because an RDF graph is a set of triples. > As an example of node marking with backtracking, consider the following > dataset and query. > > :a :p :b > :b :p :z > :b :p :c > :a :p :c > :c :p :c > :c :p :z > > { :a :p+ ?y } > > With node marking with backtracking, we get: > :a -:p-> :b > :a -:p-> :b -:p-> :z > :a -:p-> :b -:p-> :c > :a -:p-> :b -:p-> :c -:p-> :z > :a -:p-> :c > :a -:p-> :c -:p-> :z > > This is exactly all simple paths that match the path expression and also > what we would get by doing a recursive SQL query against an s,p,o > triples table. To make sure we understand each other: node marking with backtracking is determining a path such that it passes a node only once but the same node can be part of another path. As the algorithm backtracks, it removes a record of nodes passed through from some working set (the visited set). It never looks at a node if it finds it in the visited set. A simple path. http://en.wikipedia.org/wiki/Path_%28graph_theory%29 This is not finding distinct solutions. Your example shows it finds simple paths :a to :z several times so the cardinality of :z would be 3. (The distinct case is node marking and adding visited nodes to a set, not removing them from the set during traversal) > With edge marking with backtracking, we get the following additional paths: > :a -:p-> :b -:p-> :c -:p-> :c > :a -:p-> :b -:p-> :c -:p-> :c -:p-> :z > :a -:p-> :c -:p-> :c > :a -:p-> :c -:p-> :c -:p-> :z Determining cardinality by simple path should make extension to more complex path expressions easier although there is some checking to do here. Does the entire segment 'path' of :A (path)* :B e.g. :A (:p1|:p2)* :B need to be repeated? I think the rule of simple path determination extends to this case more easily because the edges are not just fixed constant terms any more. Birte - you suggested edge marking. How does node marking with backtracking and unwinding, that is "simple paths", work for you? Andy
Received on Tuesday, 15 June 2010 18:24:37 UTC