Re: Defining property paths from Andy Seaborne on 2012-03-02 (public-rdf-dawg@w3.org from January to March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 02 Mar 2012 15:21:03 +0000
To: Steve Harris <steve.harris@garlik.com>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4F50E55F.6000601@epimorphics.com>
On 02/03/12 14:56, Steve Harris wrote:
> I have a concern that this approach will increase the barrier to implementation by too much.
>
> It's already hard enough that a couple of systems (4store and Redland for e.g.) are unlikely to implement PPs, even in the medium term. This may mean even more systems never end up with a complete SPARQL 1.1 implementation.
>
> Not necessarily a critical problem, but worth noting.
>
> - Steve

Yes - there is that risk.

Another risk is that it is hard to implement at large scale because it 
does not draw on traditional DB technology, leaving open expensive 
queries if done naively. We have recognized that what is right/possible 
for one system is not for another but avoided profiles.

Can the cost be mitigated? Which aspect of the approach do you see as 
most costly?

One of the reason I prefer to put in a possible algorithm for both ALP 
and ALP1 is to help implementation.  We could go further down that path 
(pun intended, post hoc).  It's also why I was highlighting the 
similarity of ALP and ALP1 and the spec could have similar text.

(That said, for ARQ, the generic evaluation in the default engine is one 
of the smaller amounts of work of the new 1.1 features. YMMV.)

	Andy

>
> On 2012-03-02, at 13:59, Andy Seaborne wrote:
>
>> This message gives the general outline of changes for property paths - it's not detail-complete.
>>
>> The next step should be tests for nasty corner cases.
>>
>> == Defining *, +, {*} and {+}
>>
>> The algebra operator will be:
>>
>> ZeroOrMorePath1
>> OneOrMorePath1
>> ZeroOrMorePathN
>> OneOrMorePathN
>>
>> The current * and + become {*} and {+} which use
>> 1  ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
>> 2     if ( x in V ) return
>> 3     add x to V
>> 4     add x to R
>> 5     X = eval(x,path)
>> 6     For n:term in X
>> 7         ALP(n, path, R, V)
>> 8         End
>> 9     remove x from V
>>
>> V is the set of nodes considered visited.
>> A node is added to V on entry and removed on exit.
>>
>> So V is the set of nodes in the current path, not the set of nodes visited through the recursive application of ALP.
>>
>> Simply removing line 9 changes the function to track all nodes visited.
>> You don't need R and V then, so remove V and line 3:
>>
>> 1  ALP1(x:term, path, R:set of RDF terms) =
>> 2     if ( x in R ) return
>> 3     add x to R
>> 4     X = eval(x,path)
>> 5     For n:term in X
>> 6         ALP1(n, path, R)
>> 7         End
>>
>> The entry step to ALP1 for OneOrMorePath1 needs aligning.
>>
>> The nodes recorded are just those the subpath ends on.  So if the subpath is :p/:q then the overall path can loop back through the gap between a :p and :q.  This makes sense to me - its working over a (non-RDF) graph induced by only considering :p/:q as links.
>>
>> == Spec'ing DISTINCT(path)
>>
>> The spec'ing issue is whether defining DISTINCT(path) in terms of a SELECT DISTINCT sub-query is sufficient or should we explain what distinct'ness means for each of the graph operators and provide an algorithm for each.
>>
>> Some operators are naturally distinct anyway (e.g. a reverse path something distinct is distinct).
>>
>> The non-fundamental path operators are:
>>
>> X ^path Y
>> X path1 / path2 Y
>> X path1 | path2 Y
>> X path{n} Y
>> X path{n,m} Y
>> X path{n,} Y
>> X path{0,} Y
>> X path{,n} Y
>>
>> The way to define as a re-write is e.g.
>>   ?x :p/DISTINCT(path)/:q ?y
>> ==>
>>   ?x :p ?_v1 .
>>   SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
>>   ?_v2 :q ?y .
>>
>> This could even be done for * and + based on {*} and {+} but I think spelling out an algorithm is much more helpful.
>>
>> The alternative is a table defining DISTINCT(path) for path operator.
>>
>> 	Andy
>>
>>
>
Received on Friday, 2 March 2012 15:21:35 UTC