- From: Polleres, Axel <axel.polleres@siemens.com>
- Date: Thu, 8 Mar 2012 13:48:18 +0100
- To: Andy Seaborne <andy.seaborne@epimorphics.com>, "public-rdf-dawg@w3.org" <public-rdf-dawg@w3.org>

Dear Andy, all, > The "?" path operator can also generate a duplicate. If it's > defined as the union of {0} and {1} and the path loops back > to the start exactly. As we discussed already in an earlier thread [1], also "/" can cause duplicates... and likewise can {n,m}, etc. So, we'd end up having lots of more operators, right? Taking one step back, I am asking myself whether we should really produce counting and non-counting *operator* versions of all operators, and personally come more and more to the conclusion that it would probably be easier to resort - in this spec round of SPARQL - to either the DISTINCT() or ALL-PATHS() proposal (depending on the default semantics we choose). I am worried that the combination of * and + defaulting to non-counting in combination with other operators that can cause duplicates (i.e. counting) is more confusing than a clear-cut switch *around* paths. Do we actually have use-cases for mixing counting and non-counting semantics in the same path expression? It seems that - correct me if I am wrong - the use cases discussed so far were either in the one or the other category, but mixing both semantics wasn't really a use case, was it? If you agree, then I'd suggest to conentrate on the path "Spec'ing DISTINCT(path)" in your original mail. BTW, one alternative I'd like to propose would be to default overall to non-counting semantics and, instead of the (in my taste ugly) keyword ALL-PATHS() allow {...} around whole path expressions only to denote counting semantics, That'd be somewhat along the lines of your proposed {*} and {+} proposal, but keeping the switch on the entire path level (and leave refinde versions With having { } around parts of paths for future specs. Opinions? Best, Axel 1. http://lists.w3.org/Archives/Public/public-rdf-dawg/2012JanMar/0144.html -- Dr. Axel Polleres Siemens AG Österreich Corporate Technology Central Eastern Europe Research & Technologies CT T CEE Tel.: +43 (0) 51707-36983 Mobile: +43 (0) 664 88550859 Fax: +43 (0) 51707-56682 mailto:axel.polleres@siemens.com > -----Original Message----- > From: Andy Seaborne [mailto:andy.seaborne@epimorphics.com] > Sent: 08 March 2012 13:29 > To: public-rdf-dawg@w3.org > Subject: Re: Defining property paths > > Going through the details in detail: > > The "?" path operator can also generate a duplicate. If it's > defined as the union of {0} and {1} and the path loops back > to the start exactly. > > Data: > :x :p :x . > > Query: > { ?v :p ?v } or { :x :p ?v. } > > has solutions ?x=:x twice. > > ------ > | v | > ====== > | :x | > | :x | > ------ > > The systematic answer is to have ? non-counting and {?} counting. > > ? is a bit different from * and + because, with just {0} it > can be written out. * and + need the arbitrary length-ness. > > Currently ? is the same as {0,1} > > So I am thinking of making ? non-counting always, not adding > {?}, which would be written {0,1} or {,1} if necessary. If > people want a systmatic treatment, we can add {?} but I'd am > not inclined to leave {?var} open for the future for lengths. > > "?" is a less used feature. Changes are much-less likely to > affect people. > > Andy > > > On 02/03/12 13:59, Andy Seaborne wrote: > > This message gives the general outline of changes for > property paths - > > it's not detail-complete. > > > > The next step should be tests for nasty corner cases. > > > > == Defining *, +, {*} and {+} > > > > The algebra operator will be: > > > > ZeroOrMorePath1 > > OneOrMorePath1 > > ZeroOrMorePathN > > OneOrMorePathN > > > > The current * and + become {*} and {+} which use > > 1 ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) = > > 2 if ( x in V ) return > > 3 add x to V > > 4 add x to R > > 5 X = eval(x,path) > > 6 For n:term in X > > 7 ALP(n, path, R, V) > > 8 End > > 9 remove x from V > > > > V is the set of nodes considered visited. > > A node is added to V on entry and removed on exit. > > > > So V is the set of nodes in the current path, not the set of nodes > > visited through the recursive application of ALP. > > > > Simply removing line 9 changes the function to track all > nodes visited. > > You don't need R and V then, so remove V and line 3: > > > > 1 ALP1(x:term, path, R:set of RDF terms) = > > 2 if ( x in R ) return > > 3 add x to R > > 4 X = eval(x,path) > > 5 For n:term in X > > 6 ALP1(n, path, R) > > 7 End > > > > The entry step to ALP1 for OneOrMorePath1 needs aligning. > > > > The nodes recorded are just those the subpath ends on. So if the > > subpath is :p/:q then the overall path can loop back > through the gap > > between a :p and :q. This makes sense to me - its working over a > > (non-RDF) graph induced by only considering :p/:q as links. > > > > == Spec'ing DISTINCT(path) > > > > The spec'ing issue is whether defining DISTINCT(path) in terms of a > > SELECT DISTINCT sub-query is sufficient or should we explain what > > distinct'ness means for each of the graph operators and provide an > > algorithm for each. > > > > Some operators are naturally distinct anyway (e.g. a reverse path > > something distinct is distinct). > > > > The non-fundamental path operators are: > > > > X ^path Y > > X path1 / path2 Y > > X path1 | path2 Y > > X path{n} Y > > X path{n,m} Y > > X path{n,} Y > > X path{0,} Y > > X path{,n} Y > > > > The way to define as a re-write is e.g. > > ?x :p/DISTINCT(path)/:q ?y > > ==> > > ?x :p ?_v1 . > > SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 } > > ?_v2 :q ?y . > > > > This could even be done for * and + based on {*} and {+} > but I think > > spelling out an algorithm is much more helpful. > > > > The alternative is a table defining DISTINCT(path) for path > operator. > > > > Andy > > > > > >

Received on Thursday, 8 March 2012 12:52:12 UTC