- From: Orri Erling <erling@xs4all.nl>
- Date: Thu, 19 Mar 2009 23:12:04 +0100
- To: "'Seaborne, Andy'" <andy.seaborne@hp.com>, "'Lee Feigenbaum'" <lee@thefigtrees.net>
- Cc: "'SPARQL Working Group'" <public-rdf-dawg@w3.org>
>From Andy: ARQ does happen to return items from a list in order and that order gets preserved usually but it's fortuitous - people write queries in a certain way that causes it to happen. The actual property function does find the list members and generate the bindings in order. But it also relies on another aspect of ARQ - most subsequent operations when used with an in-memory graph do not reorder the intermediate results. It also happens to be true with a part of the form above because the path is linear. But consider: SELECT DISTINCT ?member { <#me> :someList ?list . ?list rdf:rest*/rdf:first ?member ?member foaf:name "FRED" } In this case, a processor might well decide to look up ?member foaf:name "FRED", do the first triple pattern and then do the list part as checking membership. It might not come out in list order . This does not currently happen in ARQ but only because the optimization across both paths and triple patterns is not aggressively reordering - yet. If it's written: SELECT DISTINCT ?member { ?member rdfs:label "Talented" <#me> :someList ?list . ?list rdf:rest*/rdf:first ?member } It might well be unordered on ?member. Such ordering isn't specified in SPARQL/2008 - the only way to guarantee it is to have an ORDER BY clause at the level of the output and to include something to sort by in the output. An idea: Replace * with {?len} and capture the length (lots of details - which length? Only shortest possible? What about two routes, same length through different intermediate graph nodes?) SELECT DISTINCT ?member { ?list rdf:rest{?length}/rdf:first ?member } ORDER BY ?length Operations that combine or extend results would not need to be defined to be order preserving. A slick system might notice it's query plan was already ordered and not actually do the ORDER BY as it's unnecessarily. Andy Andy, Lee Generally, it seems to me that RDF lists should be treated as an application of property paths with transitive steps. The query language should not specify ordering of results by anything except order by. It is enough if the order were deterministic across consecutive execution on the same data for purposes of slicing. Transitive operations may run in any order, depending on what the optimizer prefers. We notice that with parallel queries, stating an order by sometimes actually increases speed because the result set before the order by can be generated in a nondeterministic order. It would even be an idea to add a query option stating that a non-repeatable result order is acceptable. We do not however propose this so as not to introduce too many topics. If the application requires a transitive operation to produce results in order of depth of recursion, then the transitive construct should produce step numbers by which one could sort in an order by. And while the direction in which the transitive operation is performed is chosen by the optimizer, the numbering still ought to be left to right: The binding one step from the left side of the path expression would be 1,, the next farther 2 and so on. We discuss step numbering and differentiating between multiple paths in the Virtuoso piece on transitive subqueries, link posted in a previous mail just before last telco. The downside is that the full transitive subquery syntax is quite long though. To get elements in left to right order in terms of list traversal, regardless of evaluation order: select ?elt where { ?start foaf:name "Bill" . ?start rdfs:next ?member option (transitive, t_step ('step_no') as ?no, t_min (0), t_shortest_only) . ?member rdf:first ?elt . ?elt foaf:name "Joe". } order by ?no Given a list of elements with foaf:name's, this lists the ones starting with Bill up to and including Joe, in this order. The order is independent of execution plan due to the convention about step numbers. Many paths may be returned if they are of equal length. One could select a path id to differentiate between the paths. If Bill and Joe are not in the same list, the result set is emppty. We note that when both ends are given, the optimizer will often decide to expand from both ends concurrently, minimizing the search space when there are many successors/predecessors to explore. This is fairly complete but is somewhat tricky to implement and specify. We do not insist on standardizing the whole range of our capabilities, since the other proposals are shorter and can anyway be macroexpanded into ours in a straightforward manner and agreement on them is likely easier. Orri
Received on Thursday, 19 March 2009 22:13:36 UTC