Re: [TF-PP] Worked examples of cardinality for :p{2} and :p+ from Matt Perry on 2010-06-02 (public-rdf-dawg@w3.org from April to June 2010)

From: Matt Perry <matthew.perry@oracle.com>
Date: Wed, 02 Jun 2010 15:56:40 -0400
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
CC: Andy Seaborne <andy.seaborne@talis.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4C06B778.40305@oracle.com>
I'm confused too. From the proposal we discussed at this week's TC, I 
would expect the answer to

{ :a :p+ ?z }

to be

?z=:b
?z=:z
?z=:c
?z=:z

because there are 2 distinct, simple paths of length 2 that connect :a to :z. 

To me, 

?z=:b
?z=:c
?z=:z

seems like the answer one would expect when using the previous "distinct" semantics.

The edge marking approach makes sense, but I have concerns that this would be more costly to implement in practice.

Cheers,
Matt

Birte Glimm wrote:
> On 2 June 2010 14:31, Andy Seaborne <andy.seaborne@talis.com> wrote:
>   
>> This message is to make sure we are all aware about cardinality that :p{2}
>> and :p+ can encounter.
>>
>> Test cases attached (what are we doing about test cases?)
>>
>> The diamond shape of diagram 1 [*] is the triples:
>>
>> :a :p :b .
>> :b :p :z .
>> :a :p :c .
>> :c :p :z .
>>
>> Pattern: { :a :p{2} ?z }
>>
>> has solutions
>> ?z=:z
>> ?z=:z
>>
>> and it's the same as
>>
>> { :a :p _:v . _:v :p ?z }
>>
>> or SELECT ?z { :a :p ?v . ?y :p ?z }
>>     
>
> I take ?y to be ?v.
>
> I should have noticed earlier, but it just occurred to me that SPARQL
> really has no notion of blank nodes whatsoever. Each blank node in the
> query can equally be replaced by a variable and you get exactly the
> same answers and blank nodes in the active graph are just Skolem
> constants. Thus, the only thing special about them is that they might
> get renamed at some point.
>
> I had the misconception that the _:v would really act as an
> existential variable, i.e., :z is a solution if there is a mapping for
> _:v such that replacing ?z with :z and _:v with the mapping for bnodes
> (could be either to :b or to :c) results in a sub-graph of the active
> graph. I thought it does not matter how many such RDF instance
> mappings there are, but it seems I am wrong.
>
> Now this might have a consequence for the entailment regimes because
> there something is a solution if there is any RDF instance mapping
> (plus further restrictions for axiomatic triples etc), but under the
> current RDF(S) regime, you would get just one answer :z. This might
> not be what we want in that you can then get less answers than you
> would get with simple entailment/standard SPARQL semantics. It is not
> hard to fix, but it really makes me wonder why we allow for bnodes in
> the query at all.
>
>   
>> In terms of cardinality for 2-step paths, it's not the same as:
>>
>> Patterns: { :a :p+ ?z }
>> which has solutions
>>
>> ?z=:b
>> ?z=:c
>> ?z=:z
>>
>> the point being you get ?z=:z only once in the { :a :p+ ?z } case.
>>     
>
> Now I am confused (even more than before). I thought we don't want
> loop, which means you mark edges, not nodes when computing the answers
> in my understanding. E.g., I start at a go one step to b via the
> p-edge and mark the p-edge. That satisfies p+, so b is an answer.
> Since it is p+, I keep going and mode to z and mark the p-edge between
> b and z. Again, z is an answer. Now there is nowhere to go via a
> p-edge, so I backtrack. No alternatives either at b, so I go further
> back to a, where I still have the alternative to go via p to c, I mark
> the edge and put c to the answers. Now I still haven't seen the p-edge
> between b and z, so I can go there, find z to be an answer and mark
> the edge. Now I have marked all edges and I am done. That very well
> prevents loops to blow up paths, because I can use each edge only once
> when computing the solutions.
>
>   
>> but is the same as:
>> { :a :p/:p+ ?z }
>>
>> ?z=:z
>> ?z=:z
>>
>> by combination of the first ":p" (two places: :b, and :c) then steps of :p+
>>
>> This is related to loop avoidance in :p+.  If it's a breadth first search,
>> it avoids visiting a node if it's already seen it.
>>     
>
> Again, why not mark edges? Breadth first or depth first shouldn't
> matter either.
>
>   
>> This makes it consistent with:
>>
>> :a :p :b .
>> :b :p :z .
>> :a :p :c .
>> :c :p :z .
>> :z :p :X .
>>
>> which is the diamond with an extra arc from :z,
>>
>> Pattern: { :a :p+ ?z }
>>
>> ?z=:c
>> ?z=:z
>> ?z=:X
>> ?z=:b
>>
>> only one ?z=:X, despite there being 2 routes there
>>     
>
> with edge markings, you also get just one x, but twice z, telling you
> that there are two ways from a to z while loops are still not blowing
> up the result.
>
>   
>> while { :a :p{3} ?z } gets:
>>
>> ?z=:X
>> ?z=:X
>>
>> because it is the same as the BGP expansion of the path expression.
>>
>> This makes sense with loops as well: diagram 2 adds a loop to :c
>>
>> :a :p :b .
>> :b :p :z .
>> :a :p :c .
>> :c :p :z .
>> :c :p :c .
>>
>> Pattern: { :a :p+ ?z }
>> ?z=:b
>> ?z=:c
>> ?z=:z
>>     
>
> with edge marking:
> b, c, c, z, z
>
>   
>> whereas on the diamond+loop:
>> Pattern: { :a :p{2} ?z }
>> ?z=:c
>> ?z=:z
>> ?z=:z
>>
>> It seems much better to make :p+ treat DAGs (directed acyclic graphs) and
>> cycles in graphs the same and get the cardinalities above. Implementing "+"
>> only requires keeping a track of the nodes visited, not the path used to get
>> to the node (and I'm not convinced it would work out either because of pathe
>> with acyclic paths and cycles in together).
>>     
>
> That's all very well and to some extend it is just a matter of which
> compromise we want, but edge markings (I am also not talking of
> keepings whole paths) is another option. It gives slightly more
> answers, which tell you a bit more about the graph. There is no
> totally satisfying option in that you can always find boolean queries
> that say true for a fixed path ({n} notion) and false for a variable
> path (+ or *). I can live with both, although I do have a preference
> for edge markings since this is all about edge expressions and to me
> it seems more natural.
>
> Birte
>
>   
>>        Andy
>>
>> [*] http://lists.w3.org/Archives/Public/public-rdf-dawg/2010AprJun/0275.html
>>
>>
>>
>>
>>     
>
>
>
>
Received on Wednesday, 2 June 2010 19:57:27 UTC