W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > April to June 2010

Re: [TF-PP] Moving forward

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Mon, 14 Jun 2010 16:16:14 +0100
Message-ID: <4C1647BE.8010302@talis.com>
To: Matthew Perry <matthew.perry@oracle.com>
CC: public-rdf-dawg@w3.org
Matt,

When we had:
[[
PROPOSED: The cardinality of solutions to fixed-length paths
is the same as the cardinality of solutions to the path expanded into
triple patterns (with all variables projected); the cardinality of
solutions to variable-length paths is the cardinality of solutions
via paths that do not repeat nodes; the cardinality of solutions to
paths combining fixed and variable length (elt{n,} ) is a combination
of the fixed definition plus the variable definition for paths longer
than the fixed length.
]]

it looked reasonable at the time but I'm finding that it isn't do clear 
cut a distinction of variable and fixed length paths.

:p{n}
   which is similar or the same as p{n,n}
   which is similar or the same as :p/:p/..../:p
:p{n,m}
   using the text above it is :p{n} / :p{0,(m-n)}
   which is similar or the same as :p/:p/:p?/:p?
      (n times :p and (m-n :p?)
:p+
   which is also :p{1,}

Birte suggests that

:p{n,m} is
    :p{n} UNION :p{n+1} ... UNION :p{m}

although this is problematic because it goes round loops multiple times 
if fixed length paths are the same as their triple pattern expansion.

:p{1,} needs to have a relationship to :p{1,X} as X tends to infinity. 
When X is larger than the longest path in the graph, it's going to odd 
(but possible) to have :p+ and :p{1,X} returning different answers.

I can see that :p{n,m} defined as a union for small n and large m is 
going to an implementation burden for some systems using a pure expansion.

The closest consistent view point with the text above on cardinality 
seems to me, currently, to be the form that counts all paths, including 
once round loops, which is the triple matching and backtracking. 
Intuitively, it's like trying to create triple expansions but limiting 
loops to once round. It feels as if it is consistent possible future 
returning path lengths.

The other consistent viewpoint, which is not in line with the text 
above, is to make path matching distinct only. That drops the proposal 
for fixed length paths being equivalent to pure triple patterns.

Simply picking distinct everywhere seems restrictive.

We also have the fact that while we have used :p as the single step path 
component, it can be a complex expression:

   ?x (:a/:b)+ ?y
   ?x (:a*/:b)+ ?y
   ?x (:a|:b)+ ?y
   ?x (:a/(:b|^:c))+ ?y

Currently, I'm working on the idea of some fixed equivalences of 
transformations of paths down to a few operators. We at least get 
defined cardinalities this way, and an explanation although the 
treatment isn't going to be uniform as

card(:p+) = card(:p{1,}) != card(:p{1,larger than graph})

There would be an operator path+, and the expansions always put the 
unbounded part last (that does not mean the unbounded part is last in t 
the path -- the path may be :p+/:q{5}).

pathArbLengthMatch(path)

but we could start with

pathArbLengthMatch(path, min, max)

making the break with triple pattern equivalences happen when any {} is 
introduced.  "+" is {1,inf} and "*" is :p{0} union :p+

Operators like "/", "|", "^" are simply syntax for triple expansions.

The unfortunate consequence is that
{n} is not a simple triple pattern if it is {n,n}.

How does that fit with recursive SQL?

	Andy


On 09/06/2010 1:40 PM, Matthew Perry wrote:
> Thanks for the summary Andy.
>
> I do not like the triple marking with backtracking/unwinding approach to determine the cardinality for arbitrary length paths. I think this will be too hard to implement efficiently on systems that store graphs on disk. Using "distinct" or "no node repetition" semantics, most property path queries can be evaluated efficiently in a DBMS using recursive SQL, but the triple marking approach is a different story and would most likely require a really convoluted representation on disk. Since we are only talking about differences in cardinality of results, I worry that the elegance of the formalization may not justify the additional burden on developers.
>
> Cheers,
> Matt
>
> ----- Original Message -----
> From: andy.seaborne@talis.com
> To: public-rdf-dawg@w3.org
> Sent: Wednesday, June 9, 2010 8:15:06 AM GMT -05:00 US/Canada Eastern
> Subject: [TF-PP] Moving forward
>
> To summarise where I think we are:
>
> 1/ Use the expansions approach that Birte proposed in:
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2010AprJun/0293.html
>
> 2/ Define operators for zero length paths and for arbitrary length paths
> (i.e. +).
>
> 2a/ zero length paths include all subjects/objects/constant terms as
> possible bindings (and check this makes sense).
>
> 2b/ arbitrary length paths based on triple marking with
> backtracking/unwinding.  Define the operator by algorithm (of course,
> you can implement anyway you want but this defines the correct answers
> and cardinality).
>
> 3/ Need to compare the cardinality for arbitrary length paths and {n,m}
> forms since they should agree, and if not, we need to understand why
> (this is why I think the triple marking with unwinding is the way to go)
> It may make sense to define {n,m} in terms of the arbitrary length paths
> algorithm.
>
> 4/ Examples, examples, examples.
>
> 	Andy
>
Received on Monday, 14 June 2010 15:16:44 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:42 GMT