Re: [TF-PP] Worked examples of cardinality for :p{2} and :p+ from Andy Seaborne on 2010-06-02 (public-rdf-dawg@w3.org from April to June 2010)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Wed, 02 Jun 2010 22:56:45 +0100
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4C06D39D.2080701@talis.com>
On 02/06/2010 6:36 PM, Birte Glimm wrote:
...
>
> Now this might have a consequence for the entailment regimes because
> there something is a solution if there is any RDF instance mapping
> (plus further restrictions for axiomatic triples etc), but under the
> current RDF(S) regime, you would get just one answer :z. This might
> not be what we want in that you can then get less answers than you
> would get with simple entailment/standard SPARQL semantics. It is not
> hard to fix, but it really makes me wonder why we allow for bnodes in
> the query at all.

Non-distiguished variables for disjunction in OWL?  That was the reason 
in DAWG.

There is some point in the syntax as well.

>
>>
>> In terms of cardinality for 2-step paths, it's not the same as:
>>
>> Patterns: { :a :p+ ?z }
>> which has solutions
>>
>> ?z=:b
>> ?z=:c
>> ?z=:z
>>
>> the point being you get ?z=:z only once in the { :a :p+ ?z } case.
>
> Now I am confused (even more than before). I thought we don't want
> loop, which means you mark edges, not nodes when computing the answers
> in my understanding. E.g., I start at a go one step to b via the
> p-edge and mark the p-edge. That satisfies p+, so b is an answer.
> Since it is p+, I keep going and mode to z and mark the p-edge between
> b and z. Again, z is an answer. Now there is nowhere to go via a
> p-edge, so I backtrack. No alternatives either at b, so I go further
> back to a, where I still have the alternative to go via p to c, I mark
> the edge and put c to the answers. Now I still haven't seen the p-edge
> between b and z,

c and z?

> so I can go there, find z to be an answer and mark
> the edge. Now I have marked all edges and I am done. That very well
> prevents loops to blow up paths, because I can use each edge only once
> when computing the solutions.

and the answers are

?z=:b
?z=:z
?z=:c
?z=:z

>
>> but is the same as:
>> { :a :p/:p+ ?z }
>>
>> ?z=:z
>> ?z=:z
>>
>> by combination of the first ":p" (two places: :b, and :c) then steps of :p+
>>
>> This is related to loop avoidance in :p+.  If it's a breadth first search,
>> it avoids visiting a node if it's already seen it.
>
> Again, why not mark edges? Breadth first or depth first shouldn't
> matter either.

I don't think it does in either case.

Marking edges comes up with some strange (to me) cardinalities depnding 
on the exact definition - see below.

Node marking finds reachable nodes (consistently, once).

>
>> This makes it consistent with:
>>
>> :a :p :b .
>> :b :p :z .
>> :a :p :c .
>> :c :p :z .
>> :z :p :X .
>>
>> which is the diamond with an extra arc from :z,
>>
>> Pattern: { :a :p+ ?z }
>>
>> ?z=:c
>> ?z=:z
>> ?z=:X
>> ?z=:b
>>
>> only one ?z=:X, despite there being 2 routes there
>
> with edge markings, you also get just one x, but twice z, telling you
> that there are two ways from a to z while loops are still not blowing
> up the result.

Maybe I don't understand exactly what you mean by edge marking - does or 
doesn't backtracking unwind the marked edge state?

With unwinding, I think you get two from :a to :X which I can 
understand.  It's the same as count of simple paths in this case and 
matches on ?z=:X for:

{ :a :p+ ?v . ?v :p ?z }

If there are loops, edge marking with unwinding yields all simple paths 
(like node marking) and also one extra cardinality for each loop.

Node marking is about node reachability.

Edge marking with no unwinding seems to provides to distinguish some 
edges over others. If the last step is shared between 2 simple paths you 
get one answer, so edge marking isn't counting simple paths and may be 
less than a simple path count (the :X example) - why should the last 
edge special?  A shared earlier edge gives the cardinality of the later 
part but not the earlier part.

Let's look at:

:a :p :b .
:b :p :z .
:a :p :c .
:c :p :z .
:b :p :c .  # New - directed cross link :b to :c

{ :a :p+ ?Y }

?Y=:b
?Y=:z
?Y=:c
?Y=:z
?Y=:c
with no unwinding - and with unwinding adds:
?Y=:z

There are 3 simple paths from :a to :z.

>> while { :a :p{3} ?z } gets:
>>
>> ?z=:X
>> ?z=:X
>>
>> because it is the same as the BGP expansion of the path expression.
>>
>> This makes sense with loops as well: diagram 2 adds a loop to :c
>>
>> :a :p :b .
>> :b :p :z .
>> :a :p :c .
>> :c :p :z .
>> :c :p :c .
>>
>> Pattern: { :a :p+ ?z }
>> ?z=:b
>> ?z=:c
>> ?z=:z
>
> with edge marking:
> b, c, c, z, z
 >
>> Pattern: { :a :p{2} ?z }
>> ?z=:c
>> ?z=:z
>> ?z=:z
>>
>> It seems much better to make :p+ treat DAGs (directed acyclic graphs) and
>> cycles in graphs the same and get the cardinalities above. Implementing "+"
>> only requires keeping a track of the nodes visited, not the path used to get
>> to the node (and I'm not convinced it would work out either because of pathe
>> with acyclic paths and cycles in together).
>
> That's all very well and to some extend it is just a matter of which
> compromise we want, but edge markings (I am also not talking of
> keepings whole paths) is another option.

Edge marking with unwinding is keeping the current path
Edge marking with no unwinding is keeping current path and fragements of 
other paths.

> It gives slightly more
> answers, which tell you a bit more about the graph. There is no
> totally satisfying option in that you can always find boolean queries
> that say true for a fixed path ({n} notion) and false for a variable
> path (+ or *). I can live with both, although I do have a preference
> for edge markings since this is all about edge expressions and to me
> it seems more natural.

To summarise: the options are:

  node marking
  edge marking with no unwinding
  edge marking with unwinding
  path remembering.

 Andy
Received on Wednesday, 2 June 2010 21:56:56 UTC