From: Andy Seaborne <andy.seaborne@talis.com>

Date: Mon, 14 Jun 2010 16:16:14 +0100

Message-ID: <4C1647BE.8010302@talis.com>

To: Matthew Perry <matthew.perry@oracle.com>

CC: public-rdf-dawg@w3.org

Date: Mon, 14 Jun 2010 16:16:14 +0100

Message-ID: <4C1647BE.8010302@talis.com>

To: Matthew Perry <matthew.perry@oracle.com>

CC: public-rdf-dawg@w3.org

Matt, When we had: [[ PROPOSED: The cardinality of solutions to fixed-length paths is the same as the cardinality of solutions to the path expanded into triple patterns (with all variables projected); the cardinality of solutions to variable-length paths is the cardinality of solutions via paths that do not repeat nodes; the cardinality of solutions to paths combining fixed and variable length (elt{n,} ) is a combination of the fixed definition plus the variable definition for paths longer than the fixed length. ]] it looked reasonable at the time but I'm finding that it isn't do clear cut a distinction of variable and fixed length paths. :p{n} which is similar or the same as p{n,n} which is similar or the same as :p/:p/..../:p :p{n,m} using the text above it is :p{n} / :p{0,(m-n)} which is similar or the same as :p/:p/:p?/:p? (n times :p and (m-n :p?) :p+ which is also :p{1,} Birte suggests that :p{n,m} is :p{n} UNION :p{n+1} ... UNION :p{m} although this is problematic because it goes round loops multiple times if fixed length paths are the same as their triple pattern expansion. :p{1,} needs to have a relationship to :p{1,X} as X tends to infinity. When X is larger than the longest path in the graph, it's going to odd (but possible) to have :p+ and :p{1,X} returning different answers. I can see that :p{n,m} defined as a union for small n and large m is going to an implementation burden for some systems using a pure expansion. The closest consistent view point with the text above on cardinality seems to me, currently, to be the form that counts all paths, including once round loops, which is the triple matching and backtracking. Intuitively, it's like trying to create triple expansions but limiting loops to once round. It feels as if it is consistent possible future returning path lengths. The other consistent viewpoint, which is not in line with the text above, is to make path matching distinct only. That drops the proposal for fixed length paths being equivalent to pure triple patterns. Simply picking distinct everywhere seems restrictive. We also have the fact that while we have used :p as the single step path component, it can be a complex expression: ?x (:a/:b)+ ?y ?x (:a*/:b)+ ?y ?x (:a|:b)+ ?y ?x (:a/(:b|^:c))+ ?y Currently, I'm working on the idea of some fixed equivalences of transformations of paths down to a few operators. We at least get defined cardinalities this way, and an explanation although the treatment isn't going to be uniform as card(:p+) = card(:p{1,}) != card(:p{1,larger than graph}) There would be an operator path+, and the expansions always put the unbounded part last (that does not mean the unbounded part is last in t the path -- the path may be :p+/:q{5}). pathArbLengthMatch(path) but we could start with pathArbLengthMatch(path, min, max) making the break with triple pattern equivalences happen when any {} is introduced. "+" is {1,inf} and "*" is :p{0} union :p+ Operators like "/", "|", "^" are simply syntax for triple expansions. The unfortunate consequence is that {n} is not a simple triple pattern if it is {n,n}. How does that fit with recursive SQL? Andy On 09/06/2010 1:40 PM, Matthew Perry wrote: > Thanks for the summary Andy. > > I do not like the triple marking with backtracking/unwinding approach to determine the cardinality for arbitrary length paths. I think this will be too hard to implement efficiently on systems that store graphs on disk. Using "distinct" or "no node repetition" semantics, most property path queries can be evaluated efficiently in a DBMS using recursive SQL, but the triple marking approach is a different story and would most likely require a really convoluted representation on disk. Since we are only talking about differences in cardinality of results, I worry that the elegance of the formalization may not justify the additional burden on developers. > > Cheers, > Matt > > ----- Original Message ----- > From: andy.seaborne@talis.com > To: public-rdf-dawg@w3.org > Sent: Wednesday, June 9, 2010 8:15:06 AM GMT -05:00 US/Canada Eastern > Subject: [TF-PP] Moving forward > > To summarise where I think we are: > > 1/ Use the expansions approach that Birte proposed in: > http://lists.w3.org/Archives/Public/public-rdf-dawg/2010AprJun/0293.html > > 2/ Define operators for zero length paths and for arbitrary length paths > (i.e. +). > > 2a/ zero length paths include all subjects/objects/constant terms as > possible bindings (and check this makes sense). > > 2b/ arbitrary length paths based on triple marking with > backtracking/unwinding. Define the operator by algorithm (of course, > you can implement anyway you want but this defines the correct answers > and cardinality). > > 3/ Need to compare the cardinality for arbitrary length paths and {n,m} > forms since they should agree, and if not, we need to understand why > (this is why I think the triple marking with unwinding is the way to go) > It may make sense to define {n,m} in terms of the arbitrary length paths > algorithm. > > 4/ Examples, examples, examples. > > Andy >Received on Monday, 14 June 2010 15:16:44 UTC

*
This archive was generated by hypermail 2.3.1
: Wednesday, 7 January 2015 15:01:00 UTC
*