RE: Defining property paths

Some brief feedback on the current design proposals inline.


> -----Original Message-----
> From: Andy Seaborne [mailto:andy.seaborne@epimorphics.com] 
> Sent: 08 March 2012 13:28
> To: public-rdf-dawg@w3.org
> Subject: Re: Defining property paths
> 
> Going through the details in detail:
> 
> The "?" path operator can also generate a duplicate.  If it's 
> defined as the union of {0} and {1} and the path loops back 
> to the start exactly.
> 
> Data:
> :x :p :x .
> 
> Query:
> { ?v :p ?v } or { :x :p ?v. }

I assume you meant 

 { ?v :p? ?v } or { :x :p? ?v. }
 
> has solutions ?x=:x twice.
> 
> ------
> | v  |
> ======
> | :x |
> | :x |
> ------
> 
> The systematic answer is to have ? non-counting and {?} counting.
> 
> ? is a bit different from * and + because, with just {0} it 
> can be written out.  * and + need the arbitrary length-ness.
> 
> Currently ? is the same as {0,1}
> 
> So I am thinking of making ? non-counting always, not adding 
> {?}, which would be written {0,1} or {,1} if necessary.  If 
> people want a systmatic treatment, we can add {?} but I'd am 
> not inclined to leave {?var} open for the future for lengths.
> 
> "?" is a less used feature.  Changes are much-less likely to 
> affect people.

I agree with having ? Non-counting, and would slightly 
prefer the systematic approach, i.e. having
 {?} as a shortcut for {0,1}
since thinline with viewing {*} and {+} as shortcuts for 
 {0,+inf} and {1,+inf} 
which I find quite intuitive to explain the new approach, cf. my previous mail
 
> 
> On 02/03/12 13:59, Andy Seaborne wrote:
> > This message gives the general outline of changes for 
> property paths - 
> > it's not detail-complete.
> >
> > The next step should be tests for nasty corner cases.
> >
> > == Defining *, +, {*} and {+}
> >
> > The algebra operator will be:
> >
> > ZeroOrMorePath1
> > OneOrMorePath1
> > ZeroOrMorePathN
> > OneOrMorePathN
> >
> > The current * and + become {*} and {+} which use
> > 1 ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
> > 2 if ( x in V ) return
> > 3 add x to V
> > 4 add x to R
> > 5 X = eval(x,path)
> > 6 For n:term in X
> > 7 ALP(n, path, R, V)
> > 8 End
> > 9 remove x from V
> >
> > V is the set of nodes considered visited.
> > A node is added to V on entry and removed on exit.
> >
> > So V is the set of nodes in the current path, not the set of nodes 
> > visited through the recursive application of ALP.
> >
> > Simply removing line 9 changes the function to track all 
> nodes visited.
> > You don't need R and V then, so remove V and line 3:
> >
> > 1 ALP1(x:term, path, R:set of RDF terms) =
> > 2 if ( x in R ) return
> > 3 add x to R
> > 4 X = eval(x,path)
> > 5 For n:term in X
> > 6 ALP1(n, path, R)
> > 7 End
> >
> > The entry step to ALP1 for OneOrMorePath1 needs aligning.
> >
> > The nodes recorded are just those the subpath ends on. So if the 
> > subpath is :p/:q then the overall path can loop back 
> through the gap 
> > between a :p and :q. This makes sense to me - its working over a 
> > (non-RDF) graph induced by only considering :p/:q as links.
> >
> > == Spec'ing DISTINCT(path)
> >
> > The spec'ing issue is whether defining DISTINCT(path) in terms of a 
> > SELECT DISTINCT sub-query is sufficient or should we explain what 
> > distinct'ness means for each of the graph operators and provide an 
> > algorithm for each.
> >
> > Some operators are naturally distinct anyway (e.g. a reverse path 
> > something distinct is distinct).
> >
> > The non-fundamental path operators are:
> >
> > X ^path Y
> > X path1 / path2 Y
> > X path1 | path2 Y
> > X path{n} Y
> > X path{n,m} Y
> > X path{n,} Y
> > X path{0,} Y
> > X path{,n} Y
> >
> > The way to define as a re-write is e.g.
> > ?x :p/DISTINCT(path)/:q ?y
> > ==>
> > ?x :p ?_v1 .
> > SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
> > ?_v2 :q ?y .

I am not sure how this would work with a simple rewriting because of blank nodes.
Let's take  

Graph
   :x :p :y .
   :x :p :z .

Pattern:

 { ?x DISTINCT(:p{+}) [] . }

if you simply rewrite this to a DISTINCT query, i.e.

 {SELECT DISTINCT ?x {?x :p{+} [] . } }

it would project away the duplicates stemming the bnode, wouldn't it?
Or did I miss something? If bnodes are the only issue, then *maybe* they 
could be just circumvented by replacing them with special fresh variables 
in the  reqeiting algorithm, but this seems slightly like a hack, so ...

> > This could even be done for * and + based on {*} and {+} 
> > but I think  spelling out an algorithm is much more helpful.
> >
> > The alternative is a table defining DISTINCT(path) for path 
> operator.

... if that covers the blank-node-corner-case above, I'd prefer this.

HTH, best,
Axel

> >
> > Andy
> >
> >
> 
> 

Received on Sunday, 11 March 2012 21:49:28 UTC