- From: Polleres, Axel <axel.polleres@siemens.com>
- Date: Sun, 11 Mar 2012 22:48:55 +0100
- To: Andy Seaborne <andy.seaborne@epimorphics.com>, "public-rdf-dawg@w3.org" <public-rdf-dawg@w3.org>
Some brief feedback on the current design proposals inline.
> -----Original Message-----
> From: Andy Seaborne [mailto:andy.seaborne@epimorphics.com]
> Sent: 08 March 2012 13:28
> To: public-rdf-dawg@w3.org
> Subject: Re: Defining property paths
>
> Going through the details in detail:
>
> The "?" path operator can also generate a duplicate. If it's
> defined as the union of {0} and {1} and the path loops back
> to the start exactly.
>
> Data:
> :x :p :x .
>
> Query:
> { ?v :p ?v } or { :x :p ?v. }
I assume you meant
{ ?v :p? ?v } or { :x :p? ?v. }
> has solutions ?x=:x twice.
>
> ------
> | v |
> ======
> | :x |
> | :x |
> ------
>
> The systematic answer is to have ? non-counting and {?} counting.
>
> ? is a bit different from * and + because, with just {0} it
> can be written out. * and + need the arbitrary length-ness.
>
> Currently ? is the same as {0,1}
>
> So I am thinking of making ? non-counting always, not adding
> {?}, which would be written {0,1} or {,1} if necessary. If
> people want a systmatic treatment, we can add {?} but I'd am
> not inclined to leave {?var} open for the future for lengths.
>
> "?" is a less used feature. Changes are much-less likely to
> affect people.
I agree with having ? Non-counting, and would slightly
prefer the systematic approach, i.e. having
{?} as a shortcut for {0,1}
since thinline with viewing {*} and {+} as shortcuts for
{0,+inf} and {1,+inf}
which I find quite intuitive to explain the new approach, cf. my previous mail
>
> On 02/03/12 13:59, Andy Seaborne wrote:
> > This message gives the general outline of changes for
> property paths -
> > it's not detail-complete.
> >
> > The next step should be tests for nasty corner cases.
> >
> > == Defining *, +, {*} and {+}
> >
> > The algebra operator will be:
> >
> > ZeroOrMorePath1
> > OneOrMorePath1
> > ZeroOrMorePathN
> > OneOrMorePathN
> >
> > The current * and + become {*} and {+} which use
> > 1 ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
> > 2 if ( x in V ) return
> > 3 add x to V
> > 4 add x to R
> > 5 X = eval(x,path)
> > 6 For n:term in X
> > 7 ALP(n, path, R, V)
> > 8 End
> > 9 remove x from V
> >
> > V is the set of nodes considered visited.
> > A node is added to V on entry and removed on exit.
> >
> > So V is the set of nodes in the current path, not the set of nodes
> > visited through the recursive application of ALP.
> >
> > Simply removing line 9 changes the function to track all
> nodes visited.
> > You don't need R and V then, so remove V and line 3:
> >
> > 1 ALP1(x:term, path, R:set of RDF terms) =
> > 2 if ( x in R ) return
> > 3 add x to R
> > 4 X = eval(x,path)
> > 5 For n:term in X
> > 6 ALP1(n, path, R)
> > 7 End
> >
> > The entry step to ALP1 for OneOrMorePath1 needs aligning.
> >
> > The nodes recorded are just those the subpath ends on. So if the
> > subpath is :p/:q then the overall path can loop back
> through the gap
> > between a :p and :q. This makes sense to me - its working over a
> > (non-RDF) graph induced by only considering :p/:q as links.
> >
> > == Spec'ing DISTINCT(path)
> >
> > The spec'ing issue is whether defining DISTINCT(path) in terms of a
> > SELECT DISTINCT sub-query is sufficient or should we explain what
> > distinct'ness means for each of the graph operators and provide an
> > algorithm for each.
> >
> > Some operators are naturally distinct anyway (e.g. a reverse path
> > something distinct is distinct).
> >
> > The non-fundamental path operators are:
> >
> > X ^path Y
> > X path1 / path2 Y
> > X path1 | path2 Y
> > X path{n} Y
> > X path{n,m} Y
> > X path{n,} Y
> > X path{0,} Y
> > X path{,n} Y
> >
> > The way to define as a re-write is e.g.
> > ?x :p/DISTINCT(path)/:q ?y
> > ==>
> > ?x :p ?_v1 .
> > SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
> > ?_v2 :q ?y .
I am not sure how this would work with a simple rewriting because of blank nodes.
Let's take
Graph
:x :p :y .
:x :p :z .
Pattern:
{ ?x DISTINCT(:p{+}) [] . }
if you simply rewrite this to a DISTINCT query, i.e.
{SELECT DISTINCT ?x {?x :p{+} [] . } }
it would project away the duplicates stemming the bnode, wouldn't it?
Or did I miss something? If bnodes are the only issue, then *maybe* they
could be just circumvented by replacing them with special fresh variables
in the reqeiting algorithm, but this seems slightly like a hack, so ...
> > This could even be done for * and + based on {*} and {+}
> > but I think spelling out an algorithm is much more helpful.
> >
> > The alternative is a table defining DISTINCT(path) for path
> operator.
... if that covers the blank-node-corner-case above, I'd prefer this.
HTH, best,
Axel
> >
> > Andy
> >
> >
>
>
Received on Sunday, 11 March 2012 21:49:28 UTC