Re: hacked defn of ORDER BY

Eric Prud'hommeaux wrote:
> Tell me if you think this specification based on order functions and
> order expressions and large orders of fries is useful:... 

It's useful and better than the current document.

Overall points:

1/ Slicing can't be stable across different implementations and is only ever 
going to be a "best effort" in the case of blank nodes being more significant 
than some other part of the order clause.  Slicing is not to be relied on 
across implementations, and will depend on a particular implementation when 
used across queries in all the corner cases.  See below for a subset that can 
be consistent.

2/ As value extension will change the sort order, it is a balance between 
portable sort order and extension to value-based sorts.  On that balance, I 
advocate allowing ordering by value with extensions over portability of 
ordering across implementations.  We can identify a subset of value types 
where sorting is consistent.

3/ I'd like to see text that allows language tags and xsd:date sorting.

This agrees with Eric's proposed change to 11.3.1 in
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2007Feb/0015.html


Sorting by language tags can't be done at the moment without extension of some 
kind because it's by lowercase language tag so needing, say, fn:lower-case 
which isn't guaranteed:

ORDER BY fn:lower-case(?lang) ?lang

> 
> http://www.w3.org/2001/sw/DataAccess/rq23/rq25#modOrderBy
> ------------------------------------------------------------------------
> 9.1 ORDER BY
> 
> The ORDER BY clause establishes the order of a solution sequence.
> 
> Following the ORDER BY clause is a sequence of order functions, comprised of
> an expression and an optional order modifier (either ASC() or DESC()). Each
> ordering function is either ASCENDING (indicated by the ASC() modifier, or no
> modifier) or DESCENDING (indicated by the DESC() modifier).

I don't completely understand 'order function' - what is the signature of the 
function?  It seems to be a function over a solution sequence taking an 
expression as an (unevaluated) argument as the comparator (so it's a special 
form or higher-order function).  Would it be clearer to talk about the 
comparator (which comes in later) and a sorting algorithm.

> 
> PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
> 
> SELECT ?name
> WHERE { ?x foaf:name ?name }
> ORDER BY ?name
> 
> PREFIX     :    <http://example.org/ns#>
> PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
> PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>
> 
> SELECT ?name
> WHERE { ?x foaf:name ?name ; :empId ?emp }
> ORDER BY DESC(?emp)
> 
> PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
> 
> SELECT ?name
> WHERE { ?x foaf:name ?name ; :empId ?emp }
> ORDER BY ?name DESC(?emp)
> 
> The "<" operator (see the Operator Mapping) defines the relative order of
> pairs of numerics, simple literals , xsd:strings , xsd:booleans and
> xsd:dateTimes. IRIs are ordered by comparing their codepoint representations
> with the "<" operator.

As 2+3 above - I propose to make it the extended "<" (which agrees with Eric's 
comment about adding text to 11.3.1)

http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2007Feb/0015.html

> 
> SPARQL also defines a fixed, arbitrary order between some kinds of RDF terms
> that would not otherwise be ordered. This arbitrary order is necessary to
> provide consistent slicing of query solutions using LIMIT and OFFSET.

Drop the last sentence - we can't have value extensibility and consistent 
slicing across all implementations.  Actually, we can't have consistent 
slicing across all implementations in the general case.

Example: ORDER BY ?someBlankNode ?someString

where the blank node is higher precedence.

We can have consistent ordering for:

Simple literals , xsd:strings , xsd:booleans, xsd:dateTimes and IRIs.

Even XSD numerics have some interesting points:

"0001"^^xsd:integer and "0001"xsd:int

"1"^^xsd:integer and "01"xsd:integer

(Solution here: normalized XSD numerics only.)

It might be better to just say that ordering is not completely defined across 
implementations.

> 
>  1. (Lowest) no value assigned to the variable or expression in this solution.
>  2. Blank nodes
>  3. IRIs
>  4. RDF literals
>  5. A plain literal is lower than an RDF literal with type xsd:string of the
>     same lexical form.
> 
> The ASCENDING order of two solutions with respect to an ordering function is
> established by substituting the solution bindings into the expressions and
> comparing them with the "<" operator. The DESCENDING order is the reverse of
> the ASCENDING order.

I see what it means but "substituting" should be "evaluation of the expression"

> 
> The relative order of two solutions is the relative order of two solutions
> with respect to the first ordering function in the sequence. For solutions
> where the substitutions of the solution bindings produce the same RDF term,
> the order is the relative order of two solutions with respect to the next
> ordering function. The relative order of two solutions is undefined if no
> order expression evaluated for the two solutions produces a distinct RDF term.

Ordering does not produce RDF terms if I understood the sorting function 
concept.  The comparator just gives the relative order.

"""
The relative order of two solutions is undefined if no
order expression evaluated for the two solutions produces a relative ordering.
"""

> 
> Ordering a sequence of solutions always results in a sequence with the same
> number of solutions in it.
> 
> Using ORDER BY on a solution sequence for a CONSTRUCT or DESCRIBE query has no
> direct effect because only SELECT returns a sequence of results. Used in
> combination with LIMIT and OFFSET, ORDER BY can be used to return results
> generated from a different slice of the solution sequence.
> 
> Grammar rules:@@
> 
> @@ Multiple ordering conditions : ordered, by first sort key that
> differentiates two solutions.

There is some added text for this - looking at a slightly older version of the 
text?

 Andy

Received on Tuesday, 27 February 2007 12:08:17 UTC