Re: [TF-PP] Worked examples of cardinality for :p{2} and :p+ from Birte Glimm on 2010-06-02 (public-rdf-dawg@w3.org from April to June 2010)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Wed, 2 Jun 2010 18:36:26 +0100
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <AANLkTilgCcbVv42XMJQ_-LswWngxK2pJzxE6MjkNEFcl@mail.gmail.com>
On 2 June 2010 14:31, Andy Seaborne <andy.seaborne@talis.com> wrote:
> This message is to make sure we are all aware about cardinality that :p{2}
> and :p+ can encounter.
>
> Test cases attached (what are we doing about test cases?)
>
> The diamond shape of diagram 1 [*] is the triples:
>
> :a :p :b .
> :b :p :z .
> :a :p :c .
> :c :p :z .
>
> Pattern: { :a :p{2} ?z }
>
> has solutions
> ?z=:z
> ?z=:z
>
> and it's the same as
>
> { :a :p _:v . _:v :p ?z }
>
> or SELECT ?z { :a :p ?v . ?y :p ?z }

I take ?y to be ?v.

I should have noticed earlier, but it just occurred to me that SPARQL
really has no notion of blank nodes whatsoever. Each blank node in the
query can equally be replaced by a variable and you get exactly the
same answers and blank nodes in the active graph are just Skolem
constants. Thus, the only thing special about them is that they might
get renamed at some point.

I had the misconception that the _:v would really act as an
existential variable, i.e., :z is a solution if there is a mapping for
_:v such that replacing ?z with :z and _:v with the mapping for bnodes
(could be either to :b or to :c) results in a sub-graph of the active
graph. I thought it does not matter how many such RDF instance
mappings there are, but it seems I am wrong.

Now this might have a consequence for the entailment regimes because
there something is a solution if there is any RDF instance mapping
(plus further restrictions for axiomatic triples etc), but under the
current RDF(S) regime, you would get just one answer :z. This might
not be what we want in that you can then get less answers than you
would get with simple entailment/standard SPARQL semantics. It is not
hard to fix, but it really makes me wonder why we allow for bnodes in
the query at all.

>
> In terms of cardinality for 2-step paths, it's not the same as:
>
> Patterns: { :a :p+ ?z }
> which has solutions
>
> ?z=:b
> ?z=:c
> ?z=:z
>
> the point being you get ?z=:z only once in the { :a :p+ ?z } case.

Now I am confused (even more than before). I thought we don't want
loop, which means you mark edges, not nodes when computing the answers
in my understanding. E.g., I start at a go one step to b via the
p-edge and mark the p-edge. That satisfies p+, so b is an answer.
Since it is p+, I keep going and mode to z and mark the p-edge between
b and z. Again, z is an answer. Now there is nowhere to go via a
p-edge, so I backtrack. No alternatives either at b, so I go further
back to a, where I still have the alternative to go via p to c, I mark
the edge and put c to the answers. Now I still haven't seen the p-edge
between b and z, so I can go there, find z to be an answer and mark
the edge. Now I have marked all edges and I am done. That very well
prevents loops to blow up paths, because I can use each edge only once
when computing the solutions.

> but is the same as:
> { :a :p/:p+ ?z }
>
> ?z=:z
> ?z=:z
>
> by combination of the first ":p" (two places: :b, and :c) then steps of :p+
>
> This is related to loop avoidance in :p+.  If it's a breadth first search,
> it avoids visiting a node if it's already seen it.

Again, why not mark edges? Breadth first or depth first shouldn't
matter either.

> This makes it consistent with:
>
> :a :p :b .
> :b :p :z .
> :a :p :c .
> :c :p :z .
> :z :p :X .
>
> which is the diamond with an extra arc from :z,
>
> Pattern: { :a :p+ ?z }
>
> ?z=:c
> ?z=:z
> ?z=:X
> ?z=:b
>
> only one ?z=:X, despite there being 2 routes there

with edge markings, you also get just one x, but twice z, telling you
that there are two ways from a to z while loops are still not blowing
up the result.

> while { :a :p{3} ?z } gets:
>
> ?z=:X
> ?z=:X
>
> because it is the same as the BGP expansion of the path expression.
>
> This makes sense with loops as well: diagram 2 adds a loop to :c
>
> :a :p :b .
> :b :p :z .
> :a :p :c .
> :c :p :z .
> :c :p :c .
>
> Pattern: { :a :p+ ?z }
> ?z=:b
> ?z=:c
> ?z=:z

with edge marking:
b, c, c, z, z

> whereas on the diamond+loop:
> Pattern: { :a :p{2} ?z }
> ?z=:c
> ?z=:z
> ?z=:z
>
> It seems much better to make :p+ treat DAGs (directed acyclic graphs) and
> cycles in graphs the same and get the cardinalities above. Implementing "+"
> only requires keeping a track of the nodes visited, not the path used to get
> to the node (and I'm not convinced it would work out either because of pathe
> with acyclic paths and cycles in together).

That's all very well and to some extend it is just a matter of which
compromise we want, but edge markings (I am also not talking of
keepings whole paths) is another option. It gives slightly more
answers, which tell you a bit more about the graph. There is no
totally satisfying option in that you can always find boolean queries
that say true for a fixed path ({n} notion) and false for a variable
path (+ or *). I can live with both, although I do have a preference
for edge markings since this is all about edge expressions and to me
it seems more natural.

Birte

>
>        Andy
>
> [*] http://lists.w3.org/Archives/Public/public-rdf-dawg/2010AprJun/0275.html
>
>
>
>



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Wednesday, 2 June 2010 17:37:00 UTC