Re: Defining property paths from Birte Glimm on 2012-03-12 (public-rdf-dawg@w3.org from January to March 2012)

From: Birte Glimm <birte.glimm@uni-ulm.de>
Date: Mon, 12 Mar 2012 22:09:18 +0100
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <CABt65OeSnW4Re_ROUkjrWL2sEQJ_jns0TLb=yB+54=Opb1RaAw@mail.gmail.com>
Hi all,
I got actioned to review Andy's proposal and I mostly comment inline.
All in all, I am not 100% whether we really want to do DISTINCT(path)
and the counting and non-counting operator versions. All that together
seems like a lot of work to me.

On 2 March 2012 14:59, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
> This message gives the general outline of changes for property paths - it's
> not detail-complete.
>
> The next step should be tests for nasty corner cases.
>
> == Defining *, +, {*} and {+}
>
> The algebra operator will be:
>
> ZeroOrMorePath1
> OneOrMorePath1
> ZeroOrMorePathN
> OneOrMorePathN

Is that already a final name? Otherwise I would suggest something more
speaking, e.g., ZeroOrMorePathCounting and ZeroOrMorePathNonCounting
or at least ZeroOrMorePathC and ZeroOrMorePathNC, although 1 and N of
course also has some intuition of counting and non-counting.

> The current * and + become {*} and {+}

Reading the follow-up messages I get the intuition behind {*} (Axel's
viewing {*} and {+} as shortcuts for
 {0,+inf} and {1,+inf}, which I also find intuitive). For spec or LC
comment purposes I would give this intuition straight away.

> which use
>  1  ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
>  2     if ( x in V ) return
>  3     add x to V
>  4     add x to R
>  5     X = eval(x,path)
>  6     For n:term in X
>  7         ALP(n, path, R, V)
>  8         End
>  9     remove x from V
>
> V is the set of nodes considered visited.
> A node is added to V on entry and removed on exit.
>
> So V is the set of nodes in the current path, not the set of nodes visited
> through the recursive application of ALP.

I didn't get this just from the email, so looked at the spec (editors'
draft). Note for that:
"we write x:t for term*s* or variables" and I would put this comment
after the def. of "Node set of a graph" since it is only used
afterwards.
"matches based distinct nodes" <- "matches based *on* distinct nodes"

> Simply removing line 9 changes the function to track all nodes visited.
> You don't need R and V then, so remove V and line 3:
>
>  1  ALP1(x:term, path, R:set of RDF terms) =
>  2     if ( x in R ) return
>  3     add x to R
>  4     X = eval(x,path)
>  5     For n:term in X
>  6         ALP1(n, path, R)
>  7         End

Certainly not the most efficient evaluation for all situations, but as
it's written in the spec one is free to implement it differently as
long as the result is the same.

> The entry step to ALP1 for OneOrMorePath1 needs aligning.
Yes.

> The nodes recorded are just those the subpath ends on.  So if the subpath is
> :p/:q then the overall path can loop back through the gap between a :p and
> :q.  This makes sense to me - its working over a (non-RDF) graph induced by
> only considering :p/:q as links.

I am not quite sure I get this.


> == Spec'ing DISTINCT(path)
>
> The spec'ing issue is whether defining DISTINCT(path) in terms of a SELECT
> DISTINCT sub-query is sufficient or should we explain what distinct'ness
> means for each of the graph operators and provide an algorithm for each.
>
> Some operators are naturally distinct anyway (e.g. a reverse path something
> distinct is distinct).
>
> The non-fundamental path operators are:
>
> X ^path Y
> X path1 / path2 Y
> X path1 | path2 Y
> X path{n} Y
> X path{n,m} Y
> X path{n,} Y
> X path{0,} Y
> X path{,n} Y
>
> The way to define as a re-write is e.g.
>  ?x :p/DISTINCT(path)/:q ?y
> ==>
>  ?x :p ?_v1 .
>  SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
>  ?_v2 :q ?y .

Here's again Axel's problematic case:
Graph:
   :x :p :y .
   :x :p :z .
Pattern:
   { ?x DISTINCT(:p{+}) [] . }
   evaluates to \mu : ?x->:x with cardinality 2 (2 sigma instance mappings)
Simple DISTINCT rewrite:
   {SELECT DISTINCT ?x {?x :p{+} [] . } }
   evaluates to \mu : ?x->:x with cardinality 1 (DISTINCT filters)

Thus, I agree with Axel that it is not as simple as one might hope :-(

> This could even be done for * and + based on {*} and {+} but I think
> spelling out an algorithm is much more helpful.
>
> The alternative is a table defining DISTINCT(path) for path operator.

I tend to prefer that, given we can find a working rewriting.

Birte

>        Andy
>
>



-- 
Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
University of Ulm                         Fax:   +49 731 50 24188
D-89069 Ulm                               birte.glimm@uni-ulm.de
Germany
Received on Monday, 12 March 2012 21:09:49 UTC