From: Andy Seaborne <andy.seaborne@epimorphics.com>

Date: Thu, 08 Mar 2012 12:28:30 +0000

Message-ID: <4F58A5EE.4030307@epimorphics.com>

To: public-rdf-dawg@w3.org

Date: Thu, 08 Mar 2012 12:28:30 +0000

Message-ID: <4F58A5EE.4030307@epimorphics.com>

To: public-rdf-dawg@w3.org

Going through the details in detail: The "?" path operator can also generate a duplicate. If it's defined as the union of {0} and {1} and the path loops back to the start exactly. Data: :x :p :x . Query: { ?v :p ?v } or { :x :p ?v. } has solutions ?x=:x twice. ------ | v | ====== | :x | | :x | ------ The systematic answer is to have ? non-counting and {?} counting. ? is a bit different from * and + because, with just {0} it can be written out. * and + need the arbitrary length-ness. Currently ? is the same as {0,1} So I am thinking of making ? non-counting always, not adding {?}, which would be written {0,1} or {,1} if necessary. If people want a systmatic treatment, we can add {?} but I'd am not inclined to leave {?var} open for the future for lengths. "?" is a less used feature. Changes are much-less likely to affect people. Andy On 02/03/12 13:59, Andy Seaborne wrote: > This message gives the general outline of changes for property paths - > it's not detail-complete. > > The next step should be tests for nasty corner cases. > > == Defining *, +, {*} and {+} > > The algebra operator will be: > > ZeroOrMorePath1 > OneOrMorePath1 > ZeroOrMorePathN > OneOrMorePathN > > The current * and + become {*} and {+} which use > 1 ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) = > 2 if ( x in V ) return > 3 add x to V > 4 add x to R > 5 X = eval(x,path) > 6 For n:term in X > 7 ALP(n, path, R, V) > 8 End > 9 remove x from V > > V is the set of nodes considered visited. > A node is added to V on entry and removed on exit. > > So V is the set of nodes in the current path, not the set of nodes > visited through the recursive application of ALP. > > Simply removing line 9 changes the function to track all nodes visited. > You don't need R and V then, so remove V and line 3: > > 1 ALP1(x:term, path, R:set of RDF terms) = > 2 if ( x in R ) return > 3 add x to R > 4 X = eval(x,path) > 5 For n:term in X > 6 ALP1(n, path, R) > 7 End > > The entry step to ALP1 for OneOrMorePath1 needs aligning. > > The nodes recorded are just those the subpath ends on. So if the subpath > is :p/:q then the overall path can loop back through the gap between a > :p and :q. This makes sense to me - its working over a (non-RDF) graph > induced by only considering :p/:q as links. > > == Spec'ing DISTINCT(path) > > The spec'ing issue is whether defining DISTINCT(path) in terms of a > SELECT DISTINCT sub-query is sufficient or should we explain what > distinct'ness means for each of the graph operators and provide an > algorithm for each. > > Some operators are naturally distinct anyway (e.g. a reverse path > something distinct is distinct). > > The non-fundamental path operators are: > > X ^path Y > X path1 / path2 Y > X path1 | path2 Y > X path{n} Y > X path{n,m} Y > X path{n,} Y > X path{0,} Y > X path{,n} Y > > The way to define as a re-write is e.g. > ?x :p/DISTINCT(path)/:q ?y > ==> > ?x :p ?_v1 . > SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 } > ?_v2 :q ?y . > > This could even be done for * and + based on {*} and {+} but I think > spelling out an algorithm is much more helpful. > > The alternative is a table defining DISTINCT(path) for path operator. > > Andy > >Received on Thursday, 8 March 2012 12:28:59 GMT

*
This archive was generated by hypermail 2.3.1
: Tuesday, 26 March 2013 16:15:47 GMT
*