- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Fri, 30 Mar 2012 10:32:04 +0100
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
== Option 6 The path operators are: 6.A: /, |, ! as there are in 2LC. 6.B: *, +, ? are non-counting No DISTINCT, No {} forms: {n}, {n,m}, {n,}, {,m} == Definitions Of all those, the critical one is *. Given that, the machinery used to define it and the existing transformation rules, the major item is the definition of * (ZeroOrMorePath/distinct) here. The description below is not completely formal. An arbitrary length path P = (X (path)* Y) is all solutions from X to Y by repeated use of path. It collects the set of nodes visited by repeated applying path. ZeroOrMorePath includes X. OneOrMorePath starts after one application of path. First, we define a function that returns the set of end points reached by using a path expression N times. Then we use that to define path* as the bindings caused by any number of applications of the path. == Auxiliary definition: PathPoints The "path points" of a path expression P and start point X (an RDF term) is the set of nodes reached by exactly N repeated applications of P. These two definitions are equivalent: Definition 1 (recursive): PathPoints(X, P, 0) = { X } PathPoints(X, P, N) = { y | N>0 & Exists z in PathPoints(X, P, N-1) such that z P y } Definition 2 (fill in the dots): PathPoints(X, P, 0) = { X } PathPoints(X, P, N) = SELECT DISTINCT ?V WHERE { X P ?V1 . # N patterns with path P in them. ?V1 P ?V2 . .... ?V(N-1) P ?V } They are equivalent because they define sets. If you directly implemented like this, definition 1 is likely faster as the size increases. It reduces intermediate results to distinct as it goes along because it used the set answer to N-1. Definition 2 leaves all that to end unless the optimizer is smart enough. == Example :x :p :y . :y :q :z . path: :p/:q gives: PathPoints(:x, :p/:q, 1) = { :z } and not :y because :p/:q does have :y as an endpoint. == Main Definitions ** Definition: ZeroOrMorePath This defines ZeroOrMorePath to be the set of nodes reachable by any number of applications of a path expression. Let X be an RDF Node, ?V a variable and P a path expression. Then ZeroOrMorePath(X, P, ?V) = { ?V=y | y in PathPoints(X, P, N) for some N >= 0 } Now the case of two variables for end points: consider all nodes in the graph: Let V1 and V2 be variables: ZeroOrMorePath(?V1, P, ?V2) = { (?V1=z,?V2=y) | z in nodes(G) and y in ZeroOrMorePath(z, P, ?V2) } and for a path from a variable to a fixed node: ZeroOrMorePath(V, P, X) = ZeroOrMorePath(X, reverse P, V) = { ?V=z | z in nodes(G) and exists y in ZeroOrMorePath(z, P, y) } and for a path between two fixed ends: ZeroOrMorePath(X, P, Y) = { {} } if ZeroOrMorePath(X, P, ?V) contains ?V=y = { } if ZeroOrMorePath(X, P, ?V) does not contains ?V=y Notes: More identities: ZeroOrMorePath(x, P, V) = Projection of ZeroOrMorePath(V1, P, V2) of first of pair to a set = { a | (a,b) in ZeroOrMorePath(V1, P, V2) } ZeroOrMorePath(V, P, z) = Projection of ZeroOrMorePath(V1, P, V2) of second of pair to a set = { b | (a,b) in ZeroOrMorePath(V1, P, V2) } ** Definition: OneOrMorePath OneOrMorePath(X, P, ?V) = { ?V=y | y in PathPoints(X, P, N) for some N >= 1 } etc etc == Non-normative text to help implementers We can also provide a (now informative) function to calculate the reachable points from x, some term. This the result of PathPoints for any N. It's the same as the function ALP in 2LC with the last line removed, and using the result set as the set of nodes visited. # R is the overall result and the visited set ALP_1(x:term, path, R:set of RDF terms) = if ( x in R ) return add x to R X = eval(x,path) For n:term in X ALP_1(n, path, R) End The result is in R. For comparison, here is function ALP from 2LC: ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) = if ( x in V ) return add x to V add x to R X = eval(x,path) For n:term in X ALP(n, path, R, V) End remove x from V
Received on Friday, 30 March 2012 09:32:36 UTC