- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Mon, 12 Mar 2012 22:50:12 +0000
- To: public-rdf-dawg@w3.org
On 12/03/12 21:09, Birte Glimm wrote:
> Hi all,
> I got actioned to review Andy's proposal and I mostly comment inline.
> All in all, I am not 100% whether we really want to do DISTINCT(path)
> and the counting and non-counting operator versions. All that together
> seems like a lot of work to me.
>
> On 2 March 2012 14:59, Andy Seaborne<andy.seaborne@epimorphics.com> wrote:
>> This message gives the general outline of changes for property paths - it's
>> not detail-complete.
>>
>> The next step should be tests for nasty corner cases.
>>
>> == Defining *, +, {*} and {+}
>>
>> The algebra operator will be:
>>
>> ZeroOrMorePath1
>> OneOrMorePath1
>> ZeroOrMorePathN
>> OneOrMorePathN
>
> Is that already a final name? Otherwise I would suggest something more
> speaking, e.g., ZeroOrMorePathCounting and ZeroOrMorePathNonCounting
> or at least ZeroOrMorePathC and ZeroOrMorePathNC, although 1 and N of
> course also has some intuition of counting and non-counting.
Actually, I thought referring to "counting" would be a bit opaque. We
in the WG have had discussions talking about counting of paths but the
definitions are going to relate to distinctness so I went for 1 and N.
Possibly _1 and _N is better.
>
>> The current * and + become {*} and {+}
>
> Reading the follow-up messages I get the intuition behind {*} (Axel's
> viewing {*} and {+} as shortcuts for
> {0,+inf} and {1,+inf}, which I also find intuitive). For spec or LC
> comment purposes I would give this intuition straight away.
Will work that in somehow.
old * was {0,} by definition in the transformations, and so it followed
that old + was {1,}
>> which use
>> 1 ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
>> 2 if ( x in V ) return
>> 3 add x to V
>> 4 add x to R
>> 5 X = eval(x,path)
>> 6 For n:term in X
>> 7 ALP(n, path, R, V)
>> 8 End
>> 9 remove x from V
>>
>> V is the set of nodes considered visited.
>> A node is added to V on entry and removed on exit.
>>
>> So V is the set of nodes in the current path, not the set of nodes visited
>> through the recursive application of ALP.
>
> I didn't get this just from the email, so looked at the spec (editors'
> draft).
Warning! Not finished!
> Note for that:
> "we write x:t for term*s* or variables" and I would put this comment
> after the def. of "Node set of a graph" since it is only used
> afterwards.
> "matches based distinct nodes"<- "matches based *on* distinct nodes"
>
>> Simply removing line 9 changes the function to track all nodes visited.
>> You don't need R and V then, so remove V and line 3:
>>
>> 1 ALP1(x:term, path, R:set of RDF terms) =
>> 2 if ( x in R ) return
>> 3 add x to R
>> 4 X = eval(x,path)
>> 5 For n:term in X
>> 6 ALP1(n, path, R)
>> 7 End
>
> Certainly not the most efficient evaluation for all situations, but as
> it's written in the spec one is free to implement it differently as
> long as the result is the same.
>
>> The entry step to ALP1 for OneOrMorePath1 needs aligning.
> Yes.
>
>> The nodes recorded are just those the subpath ends on. So if the subpath is
>> :p/:q then the overall path can loop back through the gap between a :p and
>> :q. This makes sense to me - its working over a (non-RDF) graph induced by
>> only considering :p/:q as links.
>
> I am not quite sure I get this.
>
>
>> == Spec'ing DISTINCT(path)
>>
>> The spec'ing issue is whether defining DISTINCT(path) in terms of a SELECT
>> DISTINCT sub-query is sufficient or should we explain what distinct'ness
>> means for each of the graph operators and provide an algorithm for each.
>>
>> Some operators are naturally distinct anyway (e.g. a reverse path something
>> distinct is distinct).
>>
>> The non-fundamental path operators are:
>>
>> X ^path Y
>> X path1 / path2 Y
>> X path1 | path2 Y
>> X path{n} Y
>> X path{n,m} Y
>> X path{n,} Y
>> X path{0,} Y
>> X path{,n} Y
>>
>> The way to define as a re-write is e.g.
>> ?x :p/DISTINCT(path)/:q ?y
>> ==>
>> ?x :p ?_v1 .
>> SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
>> ?_v2 :q ?y .
>
> Here's again Axel's problematic case:
I have seen Axel's email.
> Graph:
> :x :p :y .
> :x :p :z .
> Pattern:
> { ?x DISTINCT(:p{+}) [] . }
> evaluates to \mu : ?x->:x with cardinality 2 (2 sigma instance mappings)
> Simple DISTINCT rewrite:
> {SELECT DISTINCT ?x {?x :p{+} [] . } }
> evaluates to \mu : ?x->:x with cardinality 1 (DISTINCT filters)
>
> Thus, I agree with Axel that it is not as simple as one might hope :-(
>
>> This could even be done for * and + based on {*} and {+} but I think
>> spelling out an algorithm is much more helpful.
>>
>> The alternative is a table defining DISTINCT(path) for path operator.
>
> I tend to prefer that, given we can find a working rewriting.
Hmm - there is a certain amount of contradiction here with the earlier
comment:
"All that together seems like a lot of work to me."
:-)
Andy
Received on Monday, 12 March 2012 22:50:47 UTC