Re: Defining property paths from Steve Harris on 2012-03-02 (public-rdf-dawg@w3.org from January to March 2012)

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 2 Mar 2012 14:56:25 +0000
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <F3C9C7FE-D517-4C61-A72F-B1E04D09AAD3@garlik.com>

I have a concern that this approach will increase the barrier to implementation by too much.

It's already hard enough that a couple of systems (4store and Redland for e.g.) are unlikely to implement PPs, even in the medium term. This may mean even more systems never end up with a complete SPARQL 1.1 implementation.

Not necessarily a critical problem, but worth noting.

- Steve

On 2012-03-02, at 13:59, Andy Seaborne wrote:

> This message gives the general outline of changes for property paths - it's not detail-complete.
> 
> The next step should be tests for nasty corner cases.
> 
> == Defining *, +, {*} and {+}
> 
> The algebra operator will be:
> 
> ZeroOrMorePath1
> OneOrMorePath1
> ZeroOrMorePathN
> OneOrMorePathN
> 
> The current * and + become {*} and {+} which use
> 1  ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
> 2     if ( x in V ) return
> 3     add x to V
> 4     add x to R
> 5     X = eval(x,path)
> 6     For n:term in X
> 7         ALP(n, path, R, V)
> 8         End
> 9     remove x from V
> 
> V is the set of nodes considered visited.
> A node is added to V on entry and removed on exit.
> 
> So V is the set of nodes in the current path, not the set of nodes visited through the recursive application of ALP.
> 
> Simply removing line 9 changes the function to track all nodes visited.
> You don't need R and V then, so remove V and line 3:
> 
> 1  ALP1(x:term, path, R:set of RDF terms) =
> 2     if ( x in R ) return
> 3     add x to R
> 4     X = eval(x,path)
> 5     For n:term in X
> 6         ALP1(n, path, R)
> 7         End
> 
> The entry step to ALP1 for OneOrMorePath1 needs aligning.
> 
> The nodes recorded are just those the subpath ends on.  So if the subpath is :p/:q then the overall path can loop back through the gap between a :p and :q.  This makes sense to me - its working over a (non-RDF) graph induced by only considering :p/:q as links.
> 
> == Spec'ing DISTINCT(path)
> 
> The spec'ing issue is whether defining DISTINCT(path) in terms of a SELECT DISTINCT sub-query is sufficient or should we explain what distinct'ness means for each of the graph operators and provide an algorithm for each.
> 
> Some operators are naturally distinct anyway (e.g. a reverse path something distinct is distinct).
> 
> The non-fundamental path operators are:
> 
> X ^path Y
> X path1 / path2 Y
> X path1 | path2 Y
> X path{n} Y
> X path{n,m} Y
> X path{n,} Y
> X path{0,} Y
> X path{,n} Y
> 
> The way to define as a re-write is e.g.
>  ?x :p/DISTINCT(path)/:q ?y
> ==>
>  ?x :p ?_v1 .
>  SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
>  ?_v2 :q ?y .
> 
> This could even be done for * and + based on {*} and {+} but I think spelling out an algorithm is much more helpful.
> 
> The alternative is a table defining DISTINCT(path) for path operator.
> 
> 	Andy
> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 0535 7233 VAT # 849 0517 11
Registered office: Landmark House, Experian Way, Nottingham, Notts, NG80 1ZZ

Received on Friday, 2 March 2012 14:57:14 UTC