Re: Property path summary and ways forward from Andy Seaborne on 2012-02-27 (public-rdf-dawg@w3.org from January to March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 27 Feb 2012 22:51:19 +0000
To: Axel Polleres <axel.polleres@deri.org>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4F4C08E7.9080809@epimorphics.com>

On 27/02/12 22:16, Axel Polleres wrote:
> Hi Andy,
>
> Two questions for clarification:
>
>> A minimal change is a pair of new operators {*} and {+} which are the
>> analogous to * and +.
>>
>> {*} and {+} are counting
>> * and + are not (a change to 2LC).
>
>
> 1) So, would e.g. :p{*}/:q* mix non-counting and counting?

Yes.
(as would :p*/DISTINCT(:q*)

and what is, for all proposals, DISTINCT(path) can be still lead to 
apparent duplicates because of projection elsewhere in the graph pattern.

> 2) Is this meant on top or alternatively to DISTINCT() ?

An alternative to DISTINCT(path)

(but doing both at possible.)

A practical issue in just DISTINCT() is need define evaluation for each 
operator, even ones that the transformation would otherwise have 
handled.  I am no convinced simply saying "make distinct" is helpful as 
people seem to rely on the algorithms in the spec when they really only 
define correct behaviour.

	Andy

> best,
> Axel
>
> On 27 Feb 2012, at 22:49, Andy Seaborne wrote:
>
>> 1/ There are two use cases for arbitrary length paths
>> UC1: Finding or testing for connectivity of nodes: e.g. rdfs:subClassOf*
>> UC2: Coping with variable structures and aggregation
>>
>>
>> 2/ Property paths can be expensive, notably in UC2.
>>
>> 3/ We want to leave open the possibility of path lengths
>>
>> (foaf:knows* is not a clear cut example - your LinkedIn graph is not
>> simple connectivity)
>>
>> 4/ PP transforms are used to define property path evaluation
>> leaving only ZeroLengthPath, ZeroOrMorePath, OneOrMorePath, and
>> NegatedPropertySet to be defined in the evaluation semantics.
>>
>> -----
>>
>> DISTINCT is one way forward.
>>
>> ?x DISTINCT(path) ?y .
>>
>> is the pairs of (?x,?y) connected by path.
>>
>> ?x DISTINCT(path*) ?y is connectivity.
>> ?x path* ?y is counting as per current spec.
>>
>> -----
>>
>> A minimal change is a pair of new operators {*} and {+} which are the
>> analogous to * and +.
>>
>> {*} and {+} are counting
>> * and + are not (a change to 2LC).
>>
>> (This is based on guessing {?length} is a future path length syntax -
>> see also discussion of {N,M}
>>
>> This is less than ?x DISTINCT(path) ?y
>> It can be "as well as".
>>
>> Roughly: the distinct * and + are achieved by changing ALP to remove the
>> last line "remove x from V".  We need to check the edge cases still.
>>
>> It's this state management operation that controls whether the test for
>> termination is along the current path (counting) or whether the function
>> has visited the node at all.
>>
>> -----
>> For {*}:
>>
>> {N,M} is untouched.
>> {0,*} is defined to be equivalent to {*} (i.e. still counting)
>>
>> {N,M} can be very expensive (think {0,N} in a clique of N nodes) but it
>> is syntax for ways to write the pattern as it would be in SPARQL 1.0
>>
>> ?X:p{1,M} ?Y is
>>
>>     ?X :p ?Y .
>> UNION
>>     ?x :p ?a1 . ?a1 :p Y
>> UNION
>>     ?x :p ?a1 . ?a1 :p ?a2 . ?a2 :p ?Y
>> ...
>>     ?x :p ?a1 . ?a1 :p ?a2 . ?a(M-1) :p ?Y
>>
>>          Andy
>>
>>
>

Received on Monday, 27 February 2012 22:51:44 UTC