Update on ORDERing solutions from Seaborne, Andy on 2005-03-18 (public-rdf-dawg@w3.org from January to March 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 18 Mar 2005 16:45:49 +0000
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <423B05BD.8070202@hp.com>
An update on possible ORDER BY clause for SPARQL.


1/ Process model.

First, we have a processing model of three stages:

A - A query has a pattern that generates a sequence of solutions.

B - There are some modifiers to this sequence (in this order)
    Projection, ORDER BY, DISTINCT, LIMIT, OFFSET,

C - Process the modified sequence of solutions by the result forms.


It does make some sense for CONSTRUCT and DESCRIBE to have ORDER BY because of 
slicing the results with LIMIT/OFFSET.

DISTINCT is only for SELECT, it's a no-op otherwise and is currently not allowed 
by the grammar.  This is merely for familiarity from SQL - could put the word 
after the pattern in the query like the other modifiers.

"OFFSET 0" is the no-op.  OFFSET applies after that many solutions have been 
skipped.


2/ Ordering

Ordering is by a list of criteria, applied in the order given in the query.  A 
criterion is a function (and a simple case is just a variable) together with a 
modifier for ascending or descending ordering.

The ordering criteria may not completely order the solution sequence.

e.g.
SELECT ?a ?b
...
ORDER BY ?a

or even
SELECT ?a
ORDER  ?a

because of "03"^^xsd:integer and "3"^^xsd:integer


There is a requirement that there is always a consistent order applied (not 
different each time) so that LIMIT/OFFSET work as slices.

One way is a default set of ordering rules that can always be applied to any 
solutions.  This would be based on further arbitrary ordering rules.  See rq23 
for some notes.

The other is just to leave it at arbitrary-but-consistent.

The only case I can see for the arbitrary/consistent approach would be 
significant implementation gains but they would have to be proven first. 
Therefore, I suggest going with the completely specified order and asking for LC 
(or WG) feedback.


3/ Syntax

SQL's "ORDER BY" clause

XQuery also has an "order by" clause: it specifies the modifiers in full: 
"ascending" and "descending".  Each system can take an expression or a column 
name (in SQL's case also a number).


Proposed syntax (examples):

SELECT *
WHERE { :x :p ?v . :x :q ?w }
ORDER BY ?v ?w
LIMIT  10
OFFSET 10


Confusion point: SPARQL does not have commas so I omitted them here too but then

ORDER BY ?v DESCENDING ?w

is confusing (it means descending-in-?v, ascending-in-?w but is very easy to 
miss read).

Alternative syntax: break from SQL, XQuery and have
   DESC(?v) ?w
or some other clear association of modifier with expression.

I prefer the (non-SQL) DESC(?x), ASC(xsd:integer(?x)) style for clarity.


4/ Ordering Expressions

We have a requirement (3.3) for extensible value testing.  Therefore, I put in 
expressions for ordering (like xquery, SQL) which allows types unknown to the 
core language to be ordered.  This also allows casting (useful for dates in 
non-xsd:dateTime format or older RDF without datatypes).

ORDER BY xsd:integer(?v)

ORDER BY app:cordOrder(?x, ?y)

Such an ordering function must not cause an evaluation failure.  If it does, it 
is not determined whether any results, some results or all the results in some 
junk order are returned.


5/ Misc

XQuery also has "empty greatest" and "empty least" and collation "name".

For use there are more than just the empty case (no value, bNodes, URIs and 
string) so I propose picking a fixed relationship.

Collation is covered by what we do elsewhere.

 Andy
Received on Friday, 18 March 2005 16:46:19 UTC