Re: Defining property paths from Andy Seaborne on 2012-03-12 (public-rdf-dawg@w3.org from January to March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 12 Mar 2012 22:50:12 +0000
To: public-rdf-dawg@w3.org
Message-ID: <4F5E7DA4.2020904@epimorphics.com>
On 12/03/12 21:09, Birte Glimm wrote:
> Hi all,
> I got actioned to review Andy's proposal and I mostly comment inline.
> All in all, I am not 100% whether we really want to do DISTINCT(path)
> and the counting and non-counting operator versions. All that together
> seems like a lot of work to me.
>
> On 2 March 2012 14:59, Andy Seaborne<andy.seaborne@epimorphics.com>  wrote:
>> This message gives the general outline of changes for property paths - it's
>> not detail-complete.
>>
>> The next step should be tests for nasty corner cases.
>>
>> == Defining *, +, {*} and {+}
>>
>> The algebra operator will be:
>>
>> ZeroOrMorePath1
>> OneOrMorePath1
>> ZeroOrMorePathN
>> OneOrMorePathN
>
> Is that already a final name? Otherwise I would suggest something more
> speaking, e.g., ZeroOrMorePathCounting and ZeroOrMorePathNonCounting
> or at least ZeroOrMorePathC and ZeroOrMorePathNC, although 1 and N of
> course also has some intuition of counting and non-counting.

Actually, I thought referring to "counting" would be a bit opaque.  We 
in the WG have had discussions talking about counting of paths but the 
definitions are going to relate to distinctness so I went for 1 and N.

Possibly _1 and _N is better.

>
>> The current * and + become {*} and {+}
>
> Reading the follow-up messages I get the intuition behind {*} (Axel's
> viewing {*} and {+} as shortcuts for
>   {0,+inf} and {1,+inf}, which I also find intuitive). For spec or LC
> comment purposes I would give this intuition straight away.

Will work that in somehow.

old * was {0,} by definition in the transformations, and so it followed 
that old + was {1,}

>> which use
>>   1  ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
>>   2     if ( x in V ) return
>>   3     add x to V
>>   4     add x to R
>>   5     X = eval(x,path)
>>   6     For n:term in X
>>   7         ALP(n, path, R, V)
>>   8         End
>>   9     remove x from V
>>
>> V is the set of nodes considered visited.
>> A node is added to V on entry and removed on exit.
>>
>> So V is the set of nodes in the current path, not the set of nodes visited
>> through the recursive application of ALP.
>
> I didn't get this just from the email, so looked at the spec (editors'
> draft).

Warning! Not finished!

> Note for that:
> "we write x:t for term*s* or variables" and I would put this comment
> after the def. of "Node set of a graph" since it is only used
> afterwards.
> "matches based distinct nodes"<- "matches based *on* distinct nodes"
>
>> Simply removing line 9 changes the function to track all nodes visited.
>> You don't need R and V then, so remove V and line 3:
>>
>>   1  ALP1(x:term, path, R:set of RDF terms) =
>>   2     if ( x in R ) return
>>   3     add x to R
>>   4     X = eval(x,path)
>>   5     For n:term in X
>>   6         ALP1(n, path, R)
>>   7         End
>
> Certainly not the most efficient evaluation for all situations, but as
> it's written in the spec one is free to implement it differently as
> long as the result is the same.
>
>> The entry step to ALP1 for OneOrMorePath1 needs aligning.
> Yes.
>
>> The nodes recorded are just those the subpath ends on.  So if the subpath is
>> :p/:q then the overall path can loop back through the gap between a :p and
>> :q.  This makes sense to me - its working over a (non-RDF) graph induced by
>> only considering :p/:q as links.
>
> I am not quite sure I get this.
>
>
>> == Spec'ing DISTINCT(path)
>>
>> The spec'ing issue is whether defining DISTINCT(path) in terms of a SELECT
>> DISTINCT sub-query is sufficient or should we explain what distinct'ness
>> means for each of the graph operators and provide an algorithm for each.
>>
>> Some operators are naturally distinct anyway (e.g. a reverse path something
>> distinct is distinct).
>>
>> The non-fundamental path operators are:
>>
>> X ^path Y
>> X path1 / path2 Y
>> X path1 | path2 Y
>> X path{n} Y
>> X path{n,m} Y
>> X path{n,} Y
>> X path{0,} Y
>> X path{,n} Y
>>
>> The way to define as a re-write is e.g.
>>   ?x :p/DISTINCT(path)/:q ?y
>> ==>
>>   ?x :p ?_v1 .
>>   SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
>>   ?_v2 :q ?y .
>
> Here's again Axel's problematic case:

I have seen Axel's email.

> Graph:
>     :x :p :y .
>     :x :p :z .
> Pattern:
>     { ?x DISTINCT(:p{+}) [] . }
>     evaluates to \mu : ?x->:x with cardinality 2 (2 sigma instance mappings)
> Simple DISTINCT rewrite:
>     {SELECT DISTINCT ?x {?x :p{+} [] . } }
>     evaluates to \mu : ?x->:x with cardinality 1 (DISTINCT filters)
>
> Thus, I agree with Axel that it is not as simple as one might hope :-(
>
>> This could even be done for * and + based on {*} and {+} but I think
>> spelling out an algorithm is much more helpful.
>>
>> The alternative is a table defining DISTINCT(path) for path operator.
>
> I tend to prefer that, given we can find a working rewriting.

Hmm - there is a certain amount of contradiction here with the earlier 
comment:

"All that together seems like a lot of work to me."

:-)

 Andy
Received on Monday, 12 March 2012 22:50:47 UTC