Review of Sort Design

Review of the "Order By" Issue (aka Sort Design)
(aka, the rules for ordering RDF terms are incomplete)

See: "Order By" in Section 10.1 of SparlSpec ("SPARQL
Query Language for RDF", comments here based on v1.264)

The issue is that there is no total order defined for RDF
literals (or RDF terms for that matter). This is a problem for
the "Order By" form of Select because it must return a
consistent ordering of result values. Some examples of
undefined comparisons are:
   "foo"@xx < "foo"@zz
   "03"^xsd:integer < "3"^xsd:integer
   "rst"^^<http:/mytype1> < "rst"^^<http://mytype2>
   "foo" < <http://www.example.org#someURI>

The SparqlSpec (in 10.1) proposes an arbitrary order for RDF
terms of: null < Bnodes < URIs < RDF literals. However, this
still leaves open the ordering for RDF literals (the first
three examples above).

For ordering RDF literals, the SparqlSpec (10.1) proposes we
choose between (1) a consistent ordering and (2) a full
specification of the ordering. A consistent ordering means the
server (i.e., the implementation) can choose the ordering so
long as it is consistent across all queries to that server. A
full specification means the SparqlSpec specifies the ordering
for all possible cases. The SparqlSpec now includes rules for
a full specification.


General Comments
1) UC&R - Use case 2.19 in the "RDF Data Access Use Cases and
   Requirements" document requests sorting by number, by date,
   by name.  Objective 4.11 states "the language should be able
   to express sort ordering on query results". There is no
   indication if the sorting/comparison should be lexical-based
   or value-based.

2) Collation - EricP noted (DAWG telecon 22Mar05) that the
   SparqlSpec (in Section 11) requires SPARQL to support the
   collation based on code point values when comparing strings.

3) Constraints - The SparqlSpec (Section 3.2) requires that
   constraints be evaluating based on the value of an expression
   rather than its lexical (syntactic) form, e.g.,
         "03^^xsd:int"<"3^^xsd:int"
   is true in the lexical space but false is the value space.

4) Distinct - it seems desirable (I assume) that the Select result
   forms of "Distinct" and "Order By" should use the same rules
   for comparing RDF terms. In particular, if "Order By" considers
   "03^^xsd:int" equal to "3^^xsd:int", then "Distinct" should
   eliminate one as a duplicate. SparqlSpec does not explicitly
   state that it uses the same rules as "Order By" (should it?).

5) Functions - The SparqlSpec allows ordering based on a function
   call, e.g., "... Order by f(?var)" is legal (see rule 10 in the
   Sparql grammar).


Questions
1) Lexical Space vs. Value Space -  I see no requirement that
   we order results in the value space rather than the lexical
   space. Ordering in the lexical space is well-defined and can
   be consistent with the collation. We could adopt lexical ordering
   for now and, in a subsequent revision, add options to specify
   comparison in the value space.

   This is akin to the Unix sort command which treats as fields as
   string by default but has options to compare fields as numeric,
   date, etc.

   AndyS pointed out (private comm.) that SparqlSpec allows this
   already "by forcing the value space operator"
        "Order By xsd:integer(?var)"
   This looks like a function call to me although it may be
   processed differently.

   AndyS also pointed out that pure lexical has two consequences:
   (a) "10" < "2" (b) "10"^^xsd:byte == "10^^xsd:int (the latter
   is a case where a value has two lexical forms).

   But if a primary purpose of "Order By" is just to ensure
   consistent result ordering (for use with Limit/Offset), then
   these consequences are fine. Also, we could handle case (b)
   above by requiring "Order By" to use the canonical lexical
   form when comparing values (there is only one canonical lexical
   form for each value).


2) Drop Function Calls - I see no requirement to support function
   calls. Can they be dropped? It would simplify the initial
   implementation.


3) Consistent Order vs. Full Specification - if we do value-based
   ordering, the question of a consistent order vs. a full
   specification remains (Section 10.1 of the SparqlSpec).

------------------------------------------------------------------

Received on Thursday, 24 March 2005 18:47:08 UTC