Defining property paths from Andy Seaborne on 2012-03-02 (public-rdf-dawg@w3.org from January to March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 02 Mar 2012 13:59:42 +0000
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4F50D24E.1030206@epimorphics.com>

This message gives the general outline of changes for property paths - 
it's not detail-complete.

The next step should be tests for nasty corner cases.

== Defining *, +, {*} and {+}

The algebra operator will be:

ZeroOrMorePath1
OneOrMorePath1
ZeroOrMorePathN
OneOrMorePathN

The current * and + become {*} and {+} which use
  1  ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
  2     if ( x in V ) return
  3     add x to V
  4     add x to R
  5     X = eval(x,path)
  6     For n:term in X
  7         ALP(n, path, R, V)
  8         End
  9     remove x from V

V is the set of nodes considered visited.
A node is added to V on entry and removed on exit.

So V is the set of nodes in the current path, not the set of nodes 
visited through the recursive application of ALP.

Simply removing line 9 changes the function to track all nodes visited.
You don't need R and V then, so remove V and line 3:

  1  ALP1(x:term, path, R:set of RDF terms) =
  2     if ( x in R ) return
  3     add x to R
  4     X = eval(x,path)
  5     For n:term in X
  6         ALP1(n, path, R)
  7         End

The entry step to ALP1 for OneOrMorePath1 needs aligning.

The nodes recorded are just those the subpath ends on.  So if the 
subpath is :p/:q then the overall path can loop back through the gap 
between a :p and :q.  This makes sense to me - its working over a 
(non-RDF) graph induced by only considering :p/:q as links.

== Spec'ing DISTINCT(path)

The spec'ing issue is whether defining DISTINCT(path) in terms of a 
SELECT DISTINCT sub-query is sufficient or should we explain what 
distinct'ness means for each of the graph operators and provide an 
algorithm for each.

Some operators are naturally distinct anyway (e.g. a reverse path 
something distinct is distinct).

The non-fundamental path operators are:

X ^path Y
X path1 / path2 Y
X path1 | path2 Y
X path{n} Y
X path{n,m} Y
X path{n,} Y
X path{0,} Y
X path{,n} Y

The way to define as a re-write is e.g.
   ?x :p/DISTINCT(path)/:q ?y
==>
   ?x :p ?_v1 .
   SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
   ?_v2 :q ?y .

This could even be done for * and + based on {*} and {+} but I think 
spelling out an algorithm is much more helpful.

The alternative is a table defining DISTINCT(path) for path operator.

 Andy

Received on Friday, 2 March 2012 14:00:13 UTC