Re: Defining property paths from Andy Seaborne on 2012-03-08 (public-rdf-dawg@w3.org from January to March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Thu, 08 Mar 2012 12:28:30 +0000
To: public-rdf-dawg@w3.org
Message-ID: <4F58A5EE.4030307@epimorphics.com>
Going through the details in detail:

The "?" path operator can also generate a duplicate.  If it's defined as 
the union of {0} and {1} and the path loops back to the start exactly.

Data:
:x :p :x .

Query:
{ ?v :p ?v } or { :x :p ?v. }

has solutions ?x=:x twice.

------
| v  |
======
| :x |
| :x |
------

The systematic answer is to have ? non-counting and {?} counting.

? is a bit different from * and + because, with just {0} it can be 
written out.  * and + need the arbitrary length-ness.

Currently ? is the same as {0,1}

So I am thinking of making ? non-counting always, not adding {?}, which 
would be written {0,1} or {,1} if necessary.  If people want a systmatic 
treatment, we can add {?} but I'd am not inclined to leave {?var} open 
for the future for lengths.

"?" is a less used feature.  Changes are much-less likely to affect people.

 Andy


On 02/03/12 13:59, Andy Seaborne wrote:
> This message gives the general outline of changes for property paths -
> it's not detail-complete.
>
> The next step should be tests for nasty corner cases.
>
> == Defining *, +, {*} and {+}
>
> The algebra operator will be:
>
> ZeroOrMorePath1
> OneOrMorePath1
> ZeroOrMorePathN
> OneOrMorePathN
>
> The current * and + become {*} and {+} which use
> 1 ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
> 2 if ( x in V ) return
> 3 add x to V
> 4 add x to R
> 5 X = eval(x,path)
> 6 For n:term in X
> 7 ALP(n, path, R, V)
> 8 End
> 9 remove x from V
>
> V is the set of nodes considered visited.
> A node is added to V on entry and removed on exit.
>
> So V is the set of nodes in the current path, not the set of nodes
> visited through the recursive application of ALP.
>
> Simply removing line 9 changes the function to track all nodes visited.
> You don't need R and V then, so remove V and line 3:
>
> 1 ALP1(x:term, path, R:set of RDF terms) =
> 2 if ( x in R ) return
> 3 add x to R
> 4 X = eval(x,path)
> 5 For n:term in X
> 6 ALP1(n, path, R)
> 7 End
>
> The entry step to ALP1 for OneOrMorePath1 needs aligning.
>
> The nodes recorded are just those the subpath ends on. So if the subpath
> is :p/:q then the overall path can loop back through the gap between a
> :p and :q. This makes sense to me - its working over a (non-RDF) graph
> induced by only considering :p/:q as links.
>
> == Spec'ing DISTINCT(path)
>
> The spec'ing issue is whether defining DISTINCT(path) in terms of a
> SELECT DISTINCT sub-query is sufficient or should we explain what
> distinct'ness means for each of the graph operators and provide an
> algorithm for each.
>
> Some operators are naturally distinct anyway (e.g. a reverse path
> something distinct is distinct).
>
> The non-fundamental path operators are:
>
> X ^path Y
> X path1 / path2 Y
> X path1 | path2 Y
> X path{n} Y
> X path{n,m} Y
> X path{n,} Y
> X path{0,} Y
> X path{,n} Y
>
> The way to define as a re-write is e.g.
> ?x :p/DISTINCT(path)/:q ?y
> ==>
> ?x :p ?_v1 .
> SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
> ?_v2 :q ?y .
>
> This could even be done for * and + based on {*} and {+} but I think
> spelling out an algorithm is much more helpful.
>
> The alternative is a table defining DISTINCT(path) for path operator.
>
> Andy
>
>
Received on Thursday, 8 March 2012 12:28:59 UTC