Re: Action 369 -- Look at property path tests from Andy Seaborne on 2011-02-01 (public-rdf-dawg@w3.org from January to March 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 01 Feb 2011 12:00:32 +0000
To: Matthew Perry <matthew.perry@oracle.com>
CC: W3C SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4D47F5E0.7090400@epimorphics.com>
On 31/01/11 15:50, Matthew Perry wrote:
> Hi Andy,
>
> I follow your examples and agree with the results. Here is an excerpt
> from pp16 that I don't understand.
>
> Data:
> :d foaf:knows :e .
> :e foaf:knows :f .
> :f foaf:knows :e .
> :f foaf:name "test" .
>
> query pattern:
> { ?x foaf:knows+ ?y }
>
> Result Fragment:
> ?x=:d, ?y=:d
> ?x=:d, ?y=:e
> *?x=:d, ?y=:e*
> ?x=:d, ?y=:f
>
> I don't understand where the second ?x=:d, ?y=:e comes from. I assume
> it's from the path
> :d -- foaf:knows -> :e -- foaf:knows -> :f -- foaf:knows -> :e
> but it seems like this path should be excluded because of a cycle.




As it is currently defined, we get the "?x=:d, ?y=:e" because it records 
triples traversed (as per Bite's suggestion in the telecon, we don't 
have good minutes though).  So, yes, it gets "?x=:d, ?y=:e" from 
:d--foaf:knows-->:e then one from the cycle.

This is related to the two-loop example I gave, which I think does need 
duplicates - here it's no cycles and one cycle.

I'm not sure whether this will work but if the definitional algorithm 
records points-along-a-path then it can check for duplicates such as 
this one, but not exclude the 2-cycle example.  Needs checking - it 
depends on whether the path starts inside the cycle or enters it (as in 
your example) as part of processing.

Also, there are some potential intuitive invariants we may wish to 
consider in refining the spec:

We already have:

{ ?x foaf:knows* ?y }
==
{ ?x foaf:knows{0} ?y } UNION {?x foaf:knows+ ?y }

but what about:

{ ?x foaf:knows+ ?y }
=?=
{ ?x foaf:knows/foaf:knows* ?y }

	Andy

>
> Thanks,
> Matt
>
> On 1/29/2011 11:59 AM, Andy Seaborne wrote:
>>
>>
>> On 25/01/11 17:56, Matthew Perry wrote:
>>> Hi Andy,
>>>
>>> Thanks for the explanation. I completely understand and agree with the
>>> cycles in fixed-length property paths.
>>>
>>> My concern is with the arbitrary-length paths. I would strongly prefer
>>> to have a purely node-marking algorithm instead of triple marking. The
>>> pure node marking seems to better match the resolution we passed back in
>>> June: http://www.w3.org/2009/sparql/meeting/2010-06-01#resolution_5.
>>> (The two solutions with cycles both repeat nodes).
>>>
>>> PROPOSED: The cardinality of solutions to fixed-length paths
>>> is the same as the cardinality of solutions to the path expanded into
>>> triple patterns (with all variables projected);*the cardinality of
>>> solutions to variable-length paths is the cardinality of solutions
>>> via paths that do not repeat nodes;* the cardinality of solutions to
>>> paths combining fixed and variable length (elt{n,} ) is a combination
>>> of the fixed definition plus the variable definition for paths longer
>>> than the fixed length.
>>>
>>> Thanks,
>>> Matt
>>
>> Matt,
>>
>> :p* is defined to be :p{0} UNION :p+
>>
>> The duplicate comes from the two parts
>>
>> { :a :p* ?x }
>> =
>> { :a :p{0} ?x } UNION { :a :p+ ?x }
>>
>> Data:
>> :a :p :b .
>> :b :p :c .
>> :c :p :a .
>>
>>
>> So the composition of patterns means that
>>
>> { :a :p{0} ?x } => ?x=:a
>> { :a :p+ ?x } => ?x = :b, :c, :a
>>
>> ?x=:a occurs because of the two parts. I don't think it directly
>> violates the part you quote. The algorithm is to put the endpoint of
>> triple in the results, as the edge is traversed.
>>
>> This could be changed in the spec: define an operator specifically for
>> :p*, not making it the multi-set-union of other forms.
>>
>> An effect of this is that same results come from:
>>
>> :a :p :b .
>> :b :p :c .
>>
>> i.e. cycle and a non-cycle with removal of a triple for the last hop,
>> are the same.
>>
>> I have no strong opinion either way - I'd like to see some more use
>> cases.
>>
>> - - - - - - - -
>>
>> A separate issue is when there are multiple paths/loops in the graph.
>> If there are two loops, then :p+ leads to duplicates.
>>
>> { :a :p+ ?X }
>>
>> :a :p :b .
>> :b :p :c .
>> :c :p :a .
>>
>> :a :p :x .
>> :x :p :y .
>> :y :p :a .
>>
>> ------
>> | X |
>> ======
>> | :x |
>> | :y |
>> | :a |
>> | :b |
>> | :c |
>> | :a |
>> ------
>>
>> As with the comment from Jorge, we have to be careful about use with
>> aggregates and literals. This is not illustrated by this example
>> unless literals can be subjects. Ideally, I'd like to not design-out
>> literals-as-subjects as RDF-MT already allows it, even if the syntaxes
>> do not.
>>
>> Andy
>>
>>
>
Received on Tuesday, 1 February 2011 12:01:12 UTC