- From: Kevin Wilkinson <wilkinson@hpl.hp.com>
- Date: Thu, 24 Mar 2005 10:52:43 -0800
- To: DAWG Mailing List <public-rdf-dawg@w3.org>
Review of the "Order By" Issue (aka Sort Design)
(aka, the rules for ordering RDF terms are incomplete)
See: "Order By" in Section 10.1 of SparlSpec ("SPARQL
Query Language for RDF", comments here based on v1.264)
The issue is that there is no total order defined for RDF
literals (or RDF terms for that matter). This is a problem for
the "Order By" form of Select because it must return a
consistent ordering of result values. Some examples of
undefined comparisons are:
"foo"@xx < "foo"@zz
"03"^xsd:integer < "3"^xsd:integer
"rst"^^<http:/mytype1> < "rst"^^<http://mytype2>
"foo" < <http://www.example.org#someURI>
The SparqlSpec (in 10.1) proposes an arbitrary order for RDF
terms of: null < Bnodes < URIs < RDF literals. However, this
still leaves open the ordering for RDF literals (the first
three examples above).
For ordering RDF literals, the SparqlSpec (10.1) proposes we
choose between (1) a consistent ordering and (2) a full
specification of the ordering. A consistent ordering means the
server (i.e., the implementation) can choose the ordering so
long as it is consistent across all queries to that server. A
full specification means the SparqlSpec specifies the ordering
for all possible cases. The SparqlSpec now includes rules for
a full specification.
General Comments
1) UC&R - Use case 2.19 in the "RDF Data Access Use Cases and
Requirements" document requests sorting by number, by date,
by name. Objective 4.11 states "the language should be able
to express sort ordering on query results". There is no
indication if the sorting/comparison should be lexical-based
or value-based.
2) Collation - EricP noted (DAWG telecon 22Mar05) that the
SparqlSpec (in Section 11) requires SPARQL to support the
collation based on code point values when comparing strings.
3) Constraints - The SparqlSpec (Section 3.2) requires that
constraints be evaluating based on the value of an expression
rather than its lexical (syntactic) form, e.g.,
"03^^xsd:int"<"3^^xsd:int"
is true in the lexical space but false is the value space.
4) Distinct - it seems desirable (I assume) that the Select result
forms of "Distinct" and "Order By" should use the same rules
for comparing RDF terms. In particular, if "Order By" considers
"03^^xsd:int" equal to "3^^xsd:int", then "Distinct" should
eliminate one as a duplicate. SparqlSpec does not explicitly
state that it uses the same rules as "Order By" (should it?).
5) Functions - The SparqlSpec allows ordering based on a function
call, e.g., "... Order by f(?var)" is legal (see rule 10 in the
Sparql grammar).
Questions
1) Lexical Space vs. Value Space - I see no requirement that
we order results in the value space rather than the lexical
space. Ordering in the lexical space is well-defined and can
be consistent with the collation. We could adopt lexical ordering
for now and, in a subsequent revision, add options to specify
comparison in the value space.
This is akin to the Unix sort command which treats as fields as
string by default but has options to compare fields as numeric,
date, etc.
AndyS pointed out (private comm.) that SparqlSpec allows this
already "by forcing the value space operator"
"Order By xsd:integer(?var)"
This looks like a function call to me although it may be
processed differently.
AndyS also pointed out that pure lexical has two consequences:
(a) "10" < "2" (b) "10"^^xsd:byte == "10^^xsd:int (the latter
is a case where a value has two lexical forms).
But if a primary purpose of "Order By" is just to ensure
consistent result ordering (for use with Limit/Offset), then
these consequences are fine. Also, we could handle case (b)
above by requiring "Order By" to use the canonical lexical
form when comparing values (there is only one canonical lexical
form for each value).
2) Drop Function Calls - I see no requirement to support function
calls. Can they be dropped? It would simplify the initial
implementation.
3) Consistent Order vs. Full Specification - if we do value-based
ordering, the question of a consistent order vs. a full
specification remains (Section 10.1 of the SparqlSpec).
------------------------------------------------------------------
Received on Thursday, 24 March 2005 18:47:08 UTC