- From: Birte Glimm <birte.glimm@uni-ulm.de>
- Date: Mon, 12 Mar 2012 22:09:18 +0100
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Hi all, I got actioned to review Andy's proposal and I mostly comment inline. All in all, I am not 100% whether we really want to do DISTINCT(path) and the counting and non-counting operator versions. All that together seems like a lot of work to me. On 2 March 2012 14:59, Andy Seaborne <andy.seaborne@epimorphics.com> wrote: > This message gives the general outline of changes for property paths - it's > not detail-complete. > > The next step should be tests for nasty corner cases. > > == Defining *, +, {*} and {+} > > The algebra operator will be: > > ZeroOrMorePath1 > OneOrMorePath1 > ZeroOrMorePathN > OneOrMorePathN Is that already a final name? Otherwise I would suggest something more speaking, e.g., ZeroOrMorePathCounting and ZeroOrMorePathNonCounting or at least ZeroOrMorePathC and ZeroOrMorePathNC, although 1 and N of course also has some intuition of counting and non-counting. > The current * and + become {*} and {+} Reading the follow-up messages I get the intuition behind {*} (Axel's viewing {*} and {+} as shortcuts for {0,+inf} and {1,+inf}, which I also find intuitive). For spec or LC comment purposes I would give this intuition straight away. > which use > 1 ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) = > 2 if ( x in V ) return > 3 add x to V > 4 add x to R > 5 X = eval(x,path) > 6 For n:term in X > 7 ALP(n, path, R, V) > 8 End > 9 remove x from V > > V is the set of nodes considered visited. > A node is added to V on entry and removed on exit. > > So V is the set of nodes in the current path, not the set of nodes visited > through the recursive application of ALP. I didn't get this just from the email, so looked at the spec (editors' draft). Note for that: "we write x:t for term*s* or variables" and I would put this comment after the def. of "Node set of a graph" since it is only used afterwards. "matches based distinct nodes" <- "matches based *on* distinct nodes" > Simply removing line 9 changes the function to track all nodes visited. > You don't need R and V then, so remove V and line 3: > > 1 ALP1(x:term, path, R:set of RDF terms) = > 2 if ( x in R ) return > 3 add x to R > 4 X = eval(x,path) > 5 For n:term in X > 6 ALP1(n, path, R) > 7 End Certainly not the most efficient evaluation for all situations, but as it's written in the spec one is free to implement it differently as long as the result is the same. > The entry step to ALP1 for OneOrMorePath1 needs aligning. Yes. > The nodes recorded are just those the subpath ends on. So if the subpath is > :p/:q then the overall path can loop back through the gap between a :p and > :q. This makes sense to me - its working over a (non-RDF) graph induced by > only considering :p/:q as links. I am not quite sure I get this. > == Spec'ing DISTINCT(path) > > The spec'ing issue is whether defining DISTINCT(path) in terms of a SELECT > DISTINCT sub-query is sufficient or should we explain what distinct'ness > means for each of the graph operators and provide an algorithm for each. > > Some operators are naturally distinct anyway (e.g. a reverse path something > distinct is distinct). > > The non-fundamental path operators are: > > X ^path Y > X path1 / path2 Y > X path1 | path2 Y > X path{n} Y > X path{n,m} Y > X path{n,} Y > X path{0,} Y > X path{,n} Y > > The way to define as a re-write is e.g. > ?x :p/DISTINCT(path)/:q ?y > ==> > ?x :p ?_v1 . > SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 } > ?_v2 :q ?y . Here's again Axel's problematic case: Graph: :x :p :y . :x :p :z . Pattern: { ?x DISTINCT(:p{+}) [] . } evaluates to \mu : ?x->:x with cardinality 2 (2 sigma instance mappings) Simple DISTINCT rewrite: {SELECT DISTINCT ?x {?x :p{+} [] . } } evaluates to \mu : ?x->:x with cardinality 1 (DISTINCT filters) Thus, I agree with Axel that it is not as simple as one might hope :-( > This could even be done for * and + based on {*} and {+} but I think > spelling out an algorithm is much more helpful. > > The alternative is a table defining DISTINCT(path) for path operator. I tend to prefer that, given we can find a working rewriting. Birte > Andy > > -- Jun. Prof. Dr. Birte Glimm Tel.: +49 731 50 24125 Inst. of Artificial Intelligence Secr: +49 731 50 24258 University of Ulm Fax: +49 731 50 24188 D-89069 Ulm birte.glimm@uni-ulm.de Germany
Received on Monday, 12 March 2012 21:09:49 UTC