Re: [TF-PP] Worked examples of cardinality for :p{2} and :p+ from Birte Glimm on 2010-06-03 (public-rdf-dawg@w3.org from April to June 2010)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Thu, 3 Jun 2010 12:20:39 +0100
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <AANLkTilpa4aT2pabAKnUouGq9I9bElyZk6Pycj9whSYt@mail.gmail.com>
[snip]
>>> In terms of cardinality for 2-step paths, it's not the same as:
>>>
>>> Patterns: { :a :p+ ?z }
>>> which has solutions
>>>
>>> ?z=:b
>>> ?z=:c
>>> ?z=:z
>>>
>>> the point being you get ?z=:z only once in the { :a :p+ ?z } case.
>>
>> Now I am confused (even more than before). I thought we don't want
>> loop, which means you mark edges, not nodes when computing the answers
>> in my understanding. E.g., I start at a go one step to b via the
>> p-edge and mark the p-edge. That satisfies p+, so b is an answer.
>> Since it is p+, I keep going and mode to z and mark the p-edge between
>> b and z. Again, z is an answer. Now there is nowhere to go via a
>> p-edge, so I backtrack. No alternatives either at b, so I go further
>> back to a, where I still have the alternative to go via p to c, I mark
>> the edge and put c to the answers. Now I still haven't seen the p-edge
>> between b and z,
>
> c and z?
sorry, yes.
>
>> so I can go there, find z to be an answer and mark
>> the edge. Now I have marked all edges and I am done. That very well
>> prevents loops to blow up paths, because I can use each edge only once
>> when computing the solutions.
>
> and the answers are
>
> ?z=:b
> ?z=:z
> ?z=:c
> ?z=:z

yes.


> Marking edges comes up with some strange (to me) cardinalities depnding on
> the exact definition - see below.
>
> Node marking finds reachable nodes (consistently, once).

I meant it just as you mark nodes, i.e., without unwinding, they keep
being marked when you explore a different choice. The cardinalities
can be strange in either method I guess, in particular of you compare
the results for fixed length path patterns and the ones for variable
length ones.

>>> This makes it consistent with:
>>>
>>> :a :p :b .
>>> :b :p :z .
>>> :a :p :c .
>>> :c :p :z .
>>> :z :p :X .
>>>
>>> which is the diamond with an extra arc from :z,
>>>
>>> Pattern: { :a :p+ ?z }
>>>
>>> ?z=:c
>>> ?z=:z
>>> ?z=:X
>>> ?z=:b
>>>
>>> only one ?z=:X, despite there being 2 routes there
>>
>> with edge markings, you also get just one x, but twice z, telling you
>> that there are two ways from a to z while loops are still not blowing
>> up the result.
>
> Maybe I don't understand exactly what you mean by edge marking - does or
> doesn't backtracking unwind the marked edge state?

No unwind. With unwinding, you would probably have even another
mechanism for cycle detection.

> With unwinding, I think you get two from :a to :X which I can understand.

You get :X and :z twice with unwinding and only one x but two z
without unwinding.

>  It's the same as count of simple paths in this case and matches on ?z=:X
> for:
>
> { :a :p+ ?v . ?v :p ?z }

Here I would expect with with edge marking (unwind and no unwind):
?v   ?z
b     z
z     x
c     z
z     x

I would expect with node marking:
?v   ?z
b     z
z     x
c     z

> If there are loops, edge marking with unwinding yields all simple paths
> (like node marking) and also one extra cardinality for each loop.
>
> Node marking is about node reachability.
>
> Edge marking with no unwinding seems to provides to distinguish some edges
> over others. If the last step is shared between 2 simple paths you get one
> answer, so edge marking isn't counting simple paths and may be less than a
> simple path count (the :X example) - why should the last edge special?  A
> shared earlier edge gives the cardinality of the later part but not the
> earlier part.
>
> Let's look at:
>
> :a :p :b .
> :b :p :z .
> :a :p :c .
> :c :p :z .
> :b :p :c .  # New - directed cross link :b to :c
>
> { :a :p+ ?Y }
>
> ?Y=:b
> ?Y=:z
> ?Y=:c
> ?Y=:z
> ?Y=:c
> with no unwinding - and with unwinding adds:
> ?Y=:z

agreed.

node marking gives even just three answers: b, z, and c.

> There are 3 simple paths from :a to :z.
Sure, but why do you want to go to :z? It is a :p+, so you want to go
from :a to anywhere with at least one step and there are 6 different
possibilities. I know we want to cut out some because otherwise cycles
will cause infinitely many answers, but there are different ways of
cutting.

[snip]

>>> :a :p :b .
>>> :b :p :z .
>>> :a :p :c .
>>> :c :p :z .
>>> :c :p :c .
>>>
>>> Pattern: { :a :p+ ?z }
>>> ?z=:b
>>> ?z=:c
>>> ?z=:z
>>
>> with edge marking:
>> b, c, c, z, z
disagree: b, c, z, z, z
c is always just once, but z occurs three times with edge marking no
unwind, and four times with unwind. Without unwind it tells you
something about the graph, i.e., there are three edges going to :z.
Unwinding makes you go round the loop twice, once for each way you
reached :z.

[snip]

> To summarise: the options are:
>
>  node marking
>  edge marking with no unwinding
>  edge marking with unwinding
>  path remembering.

I favour unwinding less that node and edge marking. I just wanted to
make the point that there are alternatives that are worth considering.
If there is consensus for node marking that's fine too. Edge marking
tells you more about the graph in a sense, but maybe we are more
interested in cutting the answers, for which node marking is better.
Birte

>        Andy
>



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Thursday, 3 June 2010 11:21:13 UTC