- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Mon, 12 Mar 2012 22:50:12 +0000
- To: public-rdf-dawg@w3.org
On 12/03/12 21:09, Birte Glimm wrote: > Hi all, > I got actioned to review Andy's proposal and I mostly comment inline. > All in all, I am not 100% whether we really want to do DISTINCT(path) > and the counting and non-counting operator versions. All that together > seems like a lot of work to me. > > On 2 March 2012 14:59, Andy Seaborne<andy.seaborne@epimorphics.com> wrote: >> This message gives the general outline of changes for property paths - it's >> not detail-complete. >> >> The next step should be tests for nasty corner cases. >> >> == Defining *, +, {*} and {+} >> >> The algebra operator will be: >> >> ZeroOrMorePath1 >> OneOrMorePath1 >> ZeroOrMorePathN >> OneOrMorePathN > > Is that already a final name? Otherwise I would suggest something more > speaking, e.g., ZeroOrMorePathCounting and ZeroOrMorePathNonCounting > or at least ZeroOrMorePathC and ZeroOrMorePathNC, although 1 and N of > course also has some intuition of counting and non-counting. Actually, I thought referring to "counting" would be a bit opaque. We in the WG have had discussions talking about counting of paths but the definitions are going to relate to distinctness so I went for 1 and N. Possibly _1 and _N is better. > >> The current * and + become {*} and {+} > > Reading the follow-up messages I get the intuition behind {*} (Axel's > viewing {*} and {+} as shortcuts for > {0,+inf} and {1,+inf}, which I also find intuitive). For spec or LC > comment purposes I would give this intuition straight away. Will work that in somehow. old * was {0,} by definition in the transformations, and so it followed that old + was {1,} >> which use >> 1 ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) = >> 2 if ( x in V ) return >> 3 add x to V >> 4 add x to R >> 5 X = eval(x,path) >> 6 For n:term in X >> 7 ALP(n, path, R, V) >> 8 End >> 9 remove x from V >> >> V is the set of nodes considered visited. >> A node is added to V on entry and removed on exit. >> >> So V is the set of nodes in the current path, not the set of nodes visited >> through the recursive application of ALP. > > I didn't get this just from the email, so looked at the spec (editors' > draft). Warning! Not finished! > Note for that: > "we write x:t for term*s* or variables" and I would put this comment > after the def. of "Node set of a graph" since it is only used > afterwards. > "matches based distinct nodes"<- "matches based *on* distinct nodes" > >> Simply removing line 9 changes the function to track all nodes visited. >> You don't need R and V then, so remove V and line 3: >> >> 1 ALP1(x:term, path, R:set of RDF terms) = >> 2 if ( x in R ) return >> 3 add x to R >> 4 X = eval(x,path) >> 5 For n:term in X >> 6 ALP1(n, path, R) >> 7 End > > Certainly not the most efficient evaluation for all situations, but as > it's written in the spec one is free to implement it differently as > long as the result is the same. > >> The entry step to ALP1 for OneOrMorePath1 needs aligning. > Yes. > >> The nodes recorded are just those the subpath ends on. So if the subpath is >> :p/:q then the overall path can loop back through the gap between a :p and >> :q. This makes sense to me - its working over a (non-RDF) graph induced by >> only considering :p/:q as links. > > I am not quite sure I get this. > > >> == Spec'ing DISTINCT(path) >> >> The spec'ing issue is whether defining DISTINCT(path) in terms of a SELECT >> DISTINCT sub-query is sufficient or should we explain what distinct'ness >> means for each of the graph operators and provide an algorithm for each. >> >> Some operators are naturally distinct anyway (e.g. a reverse path something >> distinct is distinct). >> >> The non-fundamental path operators are: >> >> X ^path Y >> X path1 / path2 Y >> X path1 | path2 Y >> X path{n} Y >> X path{n,m} Y >> X path{n,} Y >> X path{0,} Y >> X path{,n} Y >> >> The way to define as a re-write is e.g. >> ?x :p/DISTINCT(path)/:q ?y >> ==> >> ?x :p ?_v1 . >> SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 } >> ?_v2 :q ?y . > > Here's again Axel's problematic case: I have seen Axel's email. > Graph: > :x :p :y . > :x :p :z . > Pattern: > { ?x DISTINCT(:p{+}) [] . } > evaluates to \mu : ?x->:x with cardinality 2 (2 sigma instance mappings) > Simple DISTINCT rewrite: > {SELECT DISTINCT ?x {?x :p{+} [] . } } > evaluates to \mu : ?x->:x with cardinality 1 (DISTINCT filters) > > Thus, I agree with Axel that it is not as simple as one might hope :-( > >> This could even be done for * and + based on {*} and {+} but I think >> spelling out an algorithm is much more helpful. >> >> The alternative is a table defining DISTINCT(path) for path operator. > > I tend to prefer that, given we can find a working rewriting. Hmm - there is a certain amount of contradiction here with the earlier comment: "All that together seems like a lot of work to me." :-) Andy
Received on Monday, 12 March 2012 22:50:47 UTC