Definitions for option 6 (part 2 of 2) from Andy Seaborne on 2012-03-30 (public-rdf-dawg@w3.org from January to March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 30 Mar 2012 10:32:04 +0100
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4F757D94.8000203@epimorphics.com>
== Option 6

The path operators are:

6.A:   /, |, ! as there are in 2LC.
6.B:   *, +, ? are non-counting

No DISTINCT, No {} forms: {n}, {n,m}, {n,}, {,m}

== Definitions

Of all those, the critical one is *. Given that, the machinery used to 
define it and the existing transformation rules, the major item is the 
definition of * (ZeroOrMorePath/distinct) here.  The description below 
is not completely formal.

An arbitrary length path P = (X (path)* Y) is all solutions from
X to Y by repeated use of path.  It collects the set of nodes visited by 
repeated applying path.

ZeroOrMorePath includes X.
OneOrMorePath starts after one application of path.

First, we define a function that returns the set of end points reached 
by using a path expression N times.  Then we use that to define path* as 
the bindings caused by any number of applications of the path.

== Auxiliary definition: PathPoints

The "path points" of a path expression P and start point X (an RDF term) 
is the set of nodes reached by exactly N repeated applications of P.

These two definitions are equivalent:

Definition 1 (recursive):

PathPoints(X, P, 0) =
     { X }
PathPoints(X, P, N) =
     { y | N>0 & Exists z in PathPoints(X, P, N-1) such that z P y }

Definition 2 (fill in the dots):

PathPoints(X, P, 0) = { X }

PathPoints(X, P, N) =
    SELECT DISTINCT ?V
    WHERE { X P ?V1 .     # N patterns with path P in them.
            ?V1 P ?V2 .
            ....
            ?V(N-1) P ?V
          }

They are equivalent because they define sets.  If you directly 
implemented like this, definition 1 is likely faster as the size 
increases.  It reduces intermediate results to distinct as it goes along 
because it used the set answer to N-1.  Definition 2 leaves all that to 
end unless the optimizer is smart enough.

== Example

:x :p :y .
:y :q :z .

path:  :p/:q

gives:
PathPoints(:x, :p/:q, 1) = { :z }

and not :y because :p/:q does have :y as an endpoint.

== Main Definitions

** Definition: ZeroOrMorePath

This defines ZeroOrMorePath to be the set of nodes reachable by any 
number of applications of a path expression.

Let X be an RDF Node, ?V a variable and P a path expression.
Then

ZeroOrMorePath(X, P, ?V) =
    { ?V=y | y in PathPoints(X, P, N) for some N >= 0 }

Now the case of two variables for end points: consider all nodes in the 
graph:

Let V1 and V2 be variables:

ZeroOrMorePath(?V1, P, ?V2) =
    { (?V1=z,?V2=y) | z in nodes(G) and y in ZeroOrMorePath(z, P, ?V2) }

and for a path from a variable to a fixed node:

ZeroOrMorePath(V, P, X)
     = ZeroOrMorePath(X, reverse P, V)
     = { ?V=z | z in nodes(G) and exists y in ZeroOrMorePath(z, P, y) }

and for a path between two fixed ends:

ZeroOrMorePath(X, P, Y)
     = { {} } if ZeroOrMorePath(X, P, ?V) contains ?V=y
     = { }    if ZeroOrMorePath(X, P, ?V) does not contains ?V=y


Notes: More identities:

ZeroOrMorePath(x, P, V) =
    Projection of ZeroOrMorePath(V1, P, V2) of first of pair to a set
    = { a | (a,b) in ZeroOrMorePath(V1, P, V2) }

ZeroOrMorePath(V, P, z) =
    Projection of ZeroOrMorePath(V1, P, V2) of second of pair to a set
    = { b | (a,b) in ZeroOrMorePath(V1, P, V2) }

** Definition: OneOrMorePath

OneOrMorePath(X, P, ?V) =
    { ?V=y | y in PathPoints(X, P, N) for some N >= 1 }

etc etc

== Non-normative text to help implementers

We can also provide a (now informative) function to calculate the 
reachable points from x, some term.  This the result of PathPoints for 
any N.  It's the same as the function ALP in 2LC with the last line 
removed, and using the result set as the set of nodes visited.

# R is the overall result and the visited set

ALP_1(x:term, path, R:set of RDF terms) =
     if ( x in R ) return
     add x to R
     X = eval(x,path)
     For n:term in X
         ALP_1(n, path, R)
         End
The result is in R.

For comparison, here is function ALP from 2LC:

ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
     if ( x in V ) return
     add x to V
     add x to R
     X = eval(x,path)
     For n:term in X
         ALP(n, path, R, V)
         End
     remove x from V
Received on Friday, 30 March 2012 09:32:36 UTC