Re: Defining property paths

On 2012-03-02, at 15:21, Andy Seaborne wrote:
> 
> On 02/03/12 14:56, Steve Harris wrote:
>> I have a concern that this approach will increase the barrier to implementation by too much.
>> 
>> It's already hard enough that a couple of systems (4store and Redland for e.g.) are unlikely to implement PPs, even in the medium term. This may mean even more systems never end up with a complete SPARQL 1.1 implementation.
>> 
>> Not necessarily a critical problem, but worth noting.
>> 
>> - Steve
> 
> Yes - there is that risk.
> 
> Another risk is that it is hard to implement at large scale because it does not draw on traditional DB technology, leaving open expensive queries if done naively. We have recognized that what is right/possible for one system is not for another but avoided profiles.
> 
> Can the cost be mitigated? Which aspect of the approach do you see as most costly?

The biggest cost I see is of having to implement both the "distinct" and "non-distinct" behaviour. However, if users want both then it has to be done I guess.

> One of the reason I prefer to put in a possible algorithm for both ALP and ALP1 is to help implementation.  We could go further down that path (pun intended, post hoc).  It's also why I was highlighting the similarity of ALP and ALP1 and the spec could have similar text.
> 
> (That said, for ARQ, the generic evaluation in the default engine is one of the smaller amounts of work of the new 1.1 features. YMMV.)

I don't see an awful lot of point in an unoptimised implementation - making the parser, and code to handle the parse structure is such a lot of work that to justify it you have to make the implementation useful.

- Steve

>> On 2012-03-02, at 13:59, Andy Seaborne wrote:
>> 
>>> This message gives the general outline of changes for property paths - it's not detail-complete.
>>> 
>>> The next step should be tests for nasty corner cases.
>>> 
>>> == Defining *, +, {*} and {+}
>>> 
>>> The algebra operator will be:
>>> 
>>> ZeroOrMorePath1
>>> OneOrMorePath1
>>> ZeroOrMorePathN
>>> OneOrMorePathN
>>> 
>>> The current * and + become {*} and {+} which use
>>> 1  ALP(x:term, path, R:multiset of RDF terms , V:set of RDF terms) =
>>> 2     if ( x in V ) return
>>> 3     add x to V
>>> 4     add x to R
>>> 5     X = eval(x,path)
>>> 6     For n:term in X
>>> 7         ALP(n, path, R, V)
>>> 8         End
>>> 9     remove x from V
>>> 
>>> V is the set of nodes considered visited.
>>> A node is added to V on entry and removed on exit.
>>> 
>>> So V is the set of nodes in the current path, not the set of nodes visited through the recursive application of ALP.
>>> 
>>> Simply removing line 9 changes the function to track all nodes visited.
>>> You don't need R and V then, so remove V and line 3:
>>> 
>>> 1  ALP1(x:term, path, R:set of RDF terms) =
>>> 2     if ( x in R ) return
>>> 3     add x to R
>>> 4     X = eval(x,path)
>>> 5     For n:term in X
>>> 6         ALP1(n, path, R)
>>> 7         End
>>> 
>>> The entry step to ALP1 for OneOrMorePath1 needs aligning.
>>> 
>>> The nodes recorded are just those the subpath ends on.  So if the subpath is :p/:q then the overall path can loop back through the gap between a :p and :q.  This makes sense to me - its working over a (non-RDF) graph induced by only considering :p/:q as links.
>>> 
>>> == Spec'ing DISTINCT(path)
>>> 
>>> The spec'ing issue is whether defining DISTINCT(path) in terms of a SELECT DISTINCT sub-query is sufficient or should we explain what distinct'ness means for each of the graph operators and provide an algorithm for each.
>>> 
>>> Some operators are naturally distinct anyway (e.g. a reverse path something distinct is distinct).
>>> 
>>> The non-fundamental path operators are:
>>> 
>>> X ^path Y
>>> X path1 / path2 Y
>>> X path1 | path2 Y
>>> X path{n} Y
>>> X path{n,m} Y
>>> X path{n,} Y
>>> X path{0,} Y
>>> X path{,n} Y
>>> 
>>> The way to define as a re-write is e.g.
>>>  ?x :p/DISTINCT(path)/:q ?y
>>> ==>
>>>  ?x :p ?_v1 .
>>>  SELECT DISTINCT ?_v1 ?_v2 { ?_v1 path ?_v2 }
>>>  ?_v2 :q ?y .
>>> 
>>> This could even be done for * and + based on {*} and {+} but I think spelling out an algorithm is much more helpful.
>>> 
>>> The alternative is a table defining DISTINCT(path) for path operator.
>>> 
>>> 	Andy
>>> 
>>> 
>> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 0535 7233 VAT # 849 0517 11
Registered office: Landmark House, Experian Way, Nottingham, Notts, NG80 1ZZ

Received on Friday, 2 March 2012 16:02:19 UTC