- From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
- Date: Wed, 2 Jun 2010 18:36:26 +0100
- To: Andy Seaborne <andy.seaborne@talis.com>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>

On 2 June 2010 14:31, Andy Seaborne <andy.seaborne@talis.com> wrote: > This message is to make sure we are all aware about cardinality that :p{2} > and :p+ can encounter. > > Test cases attached (what are we doing about test cases?) > > The diamond shape of diagram 1 [*] is the triples: > > :a :p :b . > :b :p :z . > :a :p :c . > :c :p :z . > > Pattern: { :a :p{2} ?z } > > has solutions > ?z=:z > ?z=:z > > and it's the same as > > { :a :p _:v . _:v :p ?z } > > or SELECT ?z { :a :p ?v . ?y :p ?z } I take ?y to be ?v. I should have noticed earlier, but it just occurred to me that SPARQL really has no notion of blank nodes whatsoever. Each blank node in the query can equally be replaced by a variable and you get exactly the same answers and blank nodes in the active graph are just Skolem constants. Thus, the only thing special about them is that they might get renamed at some point. I had the misconception that the _:v would really act as an existential variable, i.e., :z is a solution if there is a mapping for _:v such that replacing ?z with :z and _:v with the mapping for bnodes (could be either to :b or to :c) results in a sub-graph of the active graph. I thought it does not matter how many such RDF instance mappings there are, but it seems I am wrong. Now this might have a consequence for the entailment regimes because there something is a solution if there is any RDF instance mapping (plus further restrictions for axiomatic triples etc), but under the current RDF(S) regime, you would get just one answer :z. This might not be what we want in that you can then get less answers than you would get with simple entailment/standard SPARQL semantics. It is not hard to fix, but it really makes me wonder why we allow for bnodes in the query at all. > > In terms of cardinality for 2-step paths, it's not the same as: > > Patterns: { :a :p+ ?z } > which has solutions > > ?z=:b > ?z=:c > ?z=:z > > the point being you get ?z=:z only once in the { :a :p+ ?z } case. Now I am confused (even more than before). I thought we don't want loop, which means you mark edges, not nodes when computing the answers in my understanding. E.g., I start at a go one step to b via the p-edge and mark the p-edge. That satisfies p+, so b is an answer. Since it is p+, I keep going and mode to z and mark the p-edge between b and z. Again, z is an answer. Now there is nowhere to go via a p-edge, so I backtrack. No alternatives either at b, so I go further back to a, where I still have the alternative to go via p to c, I mark the edge and put c to the answers. Now I still haven't seen the p-edge between b and z, so I can go there, find z to be an answer and mark the edge. Now I have marked all edges and I am done. That very well prevents loops to blow up paths, because I can use each edge only once when computing the solutions. > but is the same as: > { :a :p/:p+ ?z } > > ?z=:z > ?z=:z > > by combination of the first ":p" (two places: :b, and :c) then steps of :p+ > > This is related to loop avoidance in :p+. If it's a breadth first search, > it avoids visiting a node if it's already seen it. Again, why not mark edges? Breadth first or depth first shouldn't matter either. > This makes it consistent with: > > :a :p :b . > :b :p :z . > :a :p :c . > :c :p :z . > :z :p :X . > > which is the diamond with an extra arc from :z, > > Pattern: { :a :p+ ?z } > > ?z=:c > ?z=:z > ?z=:X > ?z=:b > > only one ?z=:X, despite there being 2 routes there with edge markings, you also get just one x, but twice z, telling you that there are two ways from a to z while loops are still not blowing up the result. > while { :a :p{3} ?z } gets: > > ?z=:X > ?z=:X > > because it is the same as the BGP expansion of the path expression. > > This makes sense with loops as well: diagram 2 adds a loop to :c > > :a :p :b . > :b :p :z . > :a :p :c . > :c :p :z . > :c :p :c . > > Pattern: { :a :p+ ?z } > ?z=:b > ?z=:c > ?z=:z with edge marking: b, c, c, z, z > whereas on the diamond+loop: > Pattern: { :a :p{2} ?z } > ?z=:c > ?z=:z > ?z=:z > > It seems much better to make :p+ treat DAGs (directed acyclic graphs) and > cycles in graphs the same and get the cardinalities above. Implementing "+" > only requires keeping a track of the nodes visited, not the path used to get > to the node (and I'm not convinced it would work out either because of pathe > with acyclic paths and cycles in together). That's all very well and to some extend it is just a matter of which compromise we want, but edge markings (I am also not talking of keepings whole paths) is another option. It gives slightly more answers, which tell you a bit more about the graph. There is no totally satisfying option in that you can always find boolean queries that say true for a fixed path ({n} notion) and false for a variable path (+ or *). I can live with both, although I do have a preference for edge markings since this is all about edge expressions and to me it seems more natural. Birte > > Andy > > [*] http://lists.w3.org/Archives/Public/public-rdf-dawg/2010AprJun/0275.html > > > > -- Dr. Birte Glimm, Room 306 Computing Laboratory Parks Road Oxford OX1 3QD United Kingdom +44 (0)1865 283529

Received on Wednesday, 2 June 2010 17:37:00 UTC