RE: [AccessingRdfLists] relationship with property paths?

 
>From Andy:




ARQ does happen to return items from a list in order and that order gets
preserved usually but it's fortuitous - people write queries in a certain
way that causes it to happen.  The actual property function does find the
list members and generate the bindings in order. But it also relies on
another aspect of ARQ - most subsequent operations when used with an
in-memory graph do not reorder the intermediate results.   It also happens
to be true with a part of the form above because the path is linear.

But consider:

SELECT DISTINCT ?member 
{ 
  <#me> :someList ?list .
  ?list rdf:rest*/rdf:first ?member 
  ?member foaf:name "FRED"
}

In this case, a processor might well decide to look up ?member foaf:name
"FRED", do the first triple pattern and  then do the list part as checking
membership.  It might not come out in list order .  This does not currently
happen in ARQ but only because the optimization across both paths and triple
patterns is not aggressively reordering - yet.

If it's written: 

SELECT DISTINCT ?member 
{ 
  ?member rdfs:label "Talented"
  <#me> :someList ?list .
  ?list rdf:rest*/rdf:first ?member 
}

It might well be unordered on ?member.

Such ordering isn't specified in SPARQL/2008 - the only way to guarantee it
is to have an ORDER BY clause at the level of the output and to include
something to sort by in the output.

An idea: 
   Replace * with {?len} and capture the length

(lots of details - which length? Only shortest possible? What about two
routes, same length through different intermediate graph nodes?) 

SELECT DISTINCT ?member 
{ ?list rdf:rest{?length}/rdf:first ?member }
ORDER BY ?length

Operations that combine or extend results would not need to be defined to be
order preserving. A slick system might notice it's query plan was already
ordered and not actually do the ORDER BY as it's unnecessarily.

	Andy


Andy, Lee

Generally, it seems to me that RDF lists should be treated as an application
of property paths with transitive steps.
The query language should not specify ordering of results by anything except
order by.  It is enough if the order were deterministic across consecutive
execution on the same data for purposes of slicing.  Transitive operations
may run in any order, depending on what the optimizer prefers.  We notice
that with parallel queries, stating  an order by sometimes actually
increases speed because the result set before the order by can be generated
in a nondeterministic order.  It would even be an idea to add a query option
stating that a non-repeatable result order is acceptable. We do not however
propose this so as not to introduce too many topics.

If the application requires a transitive operation to produce results in
order of depth of recursion, then the transitive construct should produce
step numbers by which one could sort in an order by.  And while the
direction in which the transitive 
 operation is performed is chosen by the optimizer, the numbering still
ought to be left to right:  The binding one step from the left side of the
path expression  would be 1,, the next farther   2 and so on.  We discuss
step numbering and differentiating between multiple paths in the Virtuoso
piece on transitive subqueries, link posted in a previous mail just before
last telco.  

The downside is that the full transitive subquery syntax is quite long
though.

To get elements in  left to right order in terms of list traversal,
regardless of evaluation order:

select ?elt  where 
{ ?start foaf:name "Bill" . 
    ?start  rdfs:next ?member option (transitive, t_step ('step_no') as ?no,
t_min (0), t_shortest_only) . 
  ?member rdf:first ?elt . 
  ?elt foaf:name "Joe". } 
order by ?no

Given a list of elements with foaf:name's, this lists the ones starting with
Bill up to and including Joe, in this order.  The order is independent of
execution plan due to the convention about step numbers.  Many paths may be
returned if they are of equal length.  One could select a path id to
differentiate between the paths.  If Bill and Joe are not in the same list,
the result set is emppty.  We note 
that when both ends are given, the optimizer will often decide to expand
from both ends concurrently, minimizing the 
search space when there are many successors/predecessors to explore.

This is fairly complete but is somewhat tricky to implement and specify.  We
do not insist on standardizing the  whole range of our capabilities, since
the other proposals are shorter and can anyway be macroexpanded into ours in
a straightforward manner and agreement on them is likely easier.



Orri

Received on Thursday, 19 March 2009 22:13:36 UTC