- From: Polleres, Axel <axel.polleres@siemens.com>
- Date: Sun, 11 Mar 2012 22:48:55 +0100
- To: Andy Seaborne <andy.seaborne@epimorphics.com>, "public-rdf-dawg@w3.org" <public-rdf-dawg@w3.org>
Some brief feedback on the current design proposals inline. > -----Original Message----- > From: Andy Seaborne [mailto:andy.seaborne@epimorphics.com] > Sent: 08 March 2012 13:28 > To: public-rdf-dawg@w3.org > Subject: Re: Defining property paths > > Going through the details in detail: > > The "?" path operator can also generate a duplicate. If it's > defined as the union of {0} and {1} and the path loops back > to the start exactly. > > Data: > :x :p :x . > > Query: > { ?v :p ?v } or { :x :p ?v. } I assume you meant { ?v :p? ?v } or { :x :p? ?v. } > has solutions ?x=:x twice. > > ------ > | v | > ====== > | :x | > | :x | > ------ > > The systematic answer is to have ? non-counting and {?} counting. > > ? is a bit different from * and + because, with just {0} it > can be written out. * and + need the arbitrary length-ness. > > Currently ? is the same as {0,1} > > So I am thinking of making ? non-counting always, not adding > {?}, which would be written {0,1} or {,1} if necessary. If > people want a systmatic treatment, we can add {?} but I'd am > not inclined to leave {?var} open for the future for lengths. > > "?" is a less used feature. Changes are much-less likely to > affect people. I agree with having ? Non-counting, and would slightly prefer the systematic approach, i.e. having {?} as a shortcut for {0,1} since thinline with viewing {*} and {+} as shortcuts for {0,+inf} and {1,+inf} which I find quite intuitive to explain the new approach, cf. my previous mail > > On 02/03/12 13:59, Andy Seaborne wrote: > > This message gives the general outline of changes for > property paths - > > it's not detail-complete. > > > > The next step should be tests for nasty corner cases. > > > > == Defining *, +, {*} and {+} > > > > The algebra operator will be: > > > > ZeroOrMorePath1 > > OneOrMorePath1 > > ZeroOrMorePathN > > OneOrMorePathN > > > > The current * and + become {*} and {+} which use > > 1 ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) = > > 2 if ( x in V ) return > > 3 add x to V > > 4 add x to R > > 5 X = eval(x,path) > > 6 For n:term in X > > 7 ALP(n, path, R, V) > > 8 End > > 9 remove x from V > > > > V is the set of nodes considered visited. > > A node is added to V on entry and removed on exit. > > > > So V is the set of nodes in the current path, not the set of nodes > > visited through the recursive application of ALP. > > > > Simply removing line 9 changes the function to track all > nodes visited. > > You don't need R and V then, so remove V and line 3: > > > > 1 ALP1(x:term, path, R:set of RDF terms) = > > 2 if ( x in R ) return > > 3 add x to R > > 4 X = eval(x,path) > > 5 For n:term in X > > 6 ALP1(n, path, R) > > 7 End > > > > The entry step to ALP1 for OneOrMorePath1 needs aligning. > > > > The nodes recorded are just those the subpath ends on. So if the > > subpath is :p/:q then the overall path can loop back > through the gap > > between a :p and :q. This makes sense to me - its working over a > > (non-RDF) graph induced by only considering :p/:q as links. > > > > == Spec'ing DISTINCT(path) > > > > The spec'ing issue is whether defining DISTINCT(path) in terms of a > > SELECT DISTINCT sub-query is sufficient or should we explain what > > distinct'ness means for each of the graph operators and provide an > > algorithm for each. > > > > Some operators are naturally distinct anyway (e.g. a reverse path > > something distinct is distinct). > > > > The non-fundamental path operators are: > > > > X ^path Y > > X path1 / path2 Y > > X path1 | path2 Y > > X path{n} Y > > X path{n,m} Y > > X path{n,} Y > > X path{0,} Y > > X path{,n} Y > > > > The way to define as a re-write is e.g. > > ?x :p/DISTINCT(path)/:q ?y > > ==> > > ?x :p ?_v1 . > > SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 } > > ?_v2 :q ?y . I am not sure how this would work with a simple rewriting because of blank nodes. Let's take Graph :x :p :y . :x :p :z . Pattern: { ?x DISTINCT(:p{+}) [] . } if you simply rewrite this to a DISTINCT query, i.e. {SELECT DISTINCT ?x {?x :p{+} [] . } } it would project away the duplicates stemming the bnode, wouldn't it? Or did I miss something? If bnodes are the only issue, then *maybe* they could be just circumvented by replacing them with special fresh variables in the reqeiting algorithm, but this seems slightly like a hack, so ... > > This could even be done for * and + based on {*} and {+} > > but I think spelling out an algorithm is much more helpful. > > > > The alternative is a table defining DISTINCT(path) for path > operator. ... if that covers the blank-node-corner-case above, I'd prefer this. HTH, best, Axel > > > > Andy > > > > > >
Received on Sunday, 11 March 2012 21:49:28 UTC