- From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
- Date: Wed, 23 Jun 2010 09:20:02 +0100
- To: Andy Seaborne <andy.seaborne@talis.com>
- Cc: "matthew.perry@oracle.com" <matthew.perry@oracle.com>, "public-rdf-dawg@w3.org" <public-rdf-dawg@w3.org>
- Message-Id: <ED968A61-8BD7-42F1-B3A2-EAE31B092C50@gmail.com>

On 15 Jun 2010, at 19:18, Andy Seaborne <andy.seaborne@talis.com> wrote: > > > On 15/06/2010 1:46 PM, Matt Perry wrote: >> Andy Seaborne wrote: >>> Matt, >>> >>> When we had: >>> [[ >>> PROPOSED: The cardinality of solutions to fixed-length paths >>> is the same as the cardinality of solutions to the path expanded into >>> triple patterns (with all variables projected); the cardinality of >>> solutions to variable-length paths is the cardinality of solutions >>> via paths that do not repeat nodes; the cardinality of solutions to >>> paths combining fixed and variable length (elt{n,} ) is a combination >>> of the fixed definition plus the variable definition for paths longer >>> than the fixed length. >>> ]] >>> >>> it looked reasonable at the time but I'm finding that it isn't do >>> clear cut a distinction of variable and fixed length paths. >>> >>> :p{n} >>> which is similar or the same as p{n,n} >>> which is similar or the same as :p/:p/..../:p >>> :p{n,m} >>> using the text above it is :p{n} / :p{0,(m-n)} >>> which is similar or the same as :p/:p/:p?/:p? >>> (n times :p and (m-n :p?) >>> :p+ >>> which is also :p{1,} >>> >>> Birte suggests that >>> >>> :p{n,m} is >>> :p{n} UNION :p{n+1} ... UNION :p{m} >>> >>> although this is problematic because it goes round loops multiple >>> times if fixed length paths are the same as their triple pattern >>> expansion. >> >> You make a good point about allowing arbitrary repetition of loops when >> evaluating the triple pattern expansion of a fixed length property path. >> Because all of our proposals prevent repetition of loops when evaluating >> unbounded property paths, we will always have some awkwardness in corner >> cases. Take the following example: >> >> :a :p :c >> :c :p :c >> :c :p :z >> >> { :a :p{4} ?y } finds the path :a -:p-> :c -:p-> :c -:p-> :c -:p-> :z, >> but { :a :p{3,} ?y } misses this path. >> >> >>> >>> :p{1,} needs to have a relationship to :p{1,X} as X tends to infinity. >>> When X is larger than the longest path in the graph, it's going to odd >>> (but possible) to have :p+ and :p{1,X} returning different answers. >>> >>> I can see that :p{n,m} defined as a union for small n and large m is >>> going to an implementation burden for some systems using a pure >>> expansion. >>> >>> The closest consistent view point with the text above on cardinality >>> seems to me, currently, to be the form that counts all paths, >>> including once round loops, which is the triple matching and >>> backtracking. Intuitively, it's like trying to create triple >>> expansions but limiting loops to once round. It feels as if it is >>> consistent possible future returning path lengths. >> >> I would argue that node marking with backtracking also makes sense and >> is much easier to implement than edge marking with backtracking. Node >> marking is easier to implement because when we encounter two nodes with >> the same URI, we know that they are the same node, but when we encounter >> two edges with the same URI, we can't say anything about their equality. >> This fact makes cycle detection based on node repetition easier in my >> opinion, especially when using disk-based storage and joining triples >> tables. > > Minor: edge equality is possible: if a triple has same subject predicate and object, it's the same triple because an RDF graph is a set of triples. > >> As an example of node marking with backtracking, consider the following >> dataset and query. >> >> :a :p :b >> :b :p :z >> :b :p :c >> :a :p :c >> :c :p :c >> :c :p :z >> >> { :a :p+ ?y } >> >> With node marking with backtracking, we get: >> :a -:p-> :b >> :a -:p-> :b -:p-> :z >> :a -:p-> :b -:p-> :c >> :a -:p-> :b -:p-> :c -:p-> :z >> :a -:p-> :c >> :a -:p-> :c -:p-> :z >> >> This is exactly all simple paths that match the path expression and also >> what we would get by doing a recursive SQL query against an s,p,o >> triples table. > > To make sure we understand each other: node marking with backtracking is determining a path such that it passes a node only once but the same node can be part of another path. > > As the algorithm backtracks, it removes a record of nodes passed through from some working set (the visited set). It never looks at a node if it finds it in the visited set. > > A simple path. > http://en.wikipedia.org/wiki/Path_%28graph_theory%29 > > This is not finding distinct solutions. Your example shows it finds simple paths :a to :z several times so the cardinality of :z would be 3. > > (The distinct case is node marking and adding visited nodes to a set, not removing them from the set during traversal) > >> With edge marking with backtracking, we get the following additional paths: >> :a -:p-> :b -:p-> :c -:p-> :c >> :a -:p-> :b -:p-> :c -:p-> :c -:p-> :z >> :a -:p-> :c -:p-> :c >> :a -:p-> :c -:p-> :c -:p-> :z > > Determining cardinality by simple path should make extension to more complex path expressions easier although there is some checking to do here. Does the entire segment 'path' of > > :A (path)* :B > e.g. > :A (:p1|:p2)* :B > > need to be repeated? I think the rule of simple path determination extends to this case more easily because the edges are not just fixed constant terms any more. > > Birte - you suggested edge marking. How does node marking with backtracking and unwinding, that is "simple paths", work for you? > I am not insisting on edge marking. If node marking fits the overall picture better, that's fine. Birte > Andy > >

Received on Wednesday, 23 June 2010 08:21:03 UTC