- From: Kevin Wilkinson <wilkinson@hpl.hp.com>
- Date: Thu, 24 Mar 2005 10:52:43 -0800
- To: DAWG Mailing List <public-rdf-dawg@w3.org>
Review of the "Order By" Issue (aka Sort Design) (aka, the rules for ordering RDF terms are incomplete) See: "Order By" in Section 10.1 of SparlSpec ("SPARQL Query Language for RDF", comments here based on v1.264) The issue is that there is no total order defined for RDF literals (or RDF terms for that matter). This is a problem for the "Order By" form of Select because it must return a consistent ordering of result values. Some examples of undefined comparisons are: "foo"@xx < "foo"@zz "03"^xsd:integer < "3"^xsd:integer "rst"^^<http:/mytype1> < "rst"^^<http://mytype2> "foo" < <http://www.example.org#someURI> The SparqlSpec (in 10.1) proposes an arbitrary order for RDF terms of: null < Bnodes < URIs < RDF literals. However, this still leaves open the ordering for RDF literals (the first three examples above). For ordering RDF literals, the SparqlSpec (10.1) proposes we choose between (1) a consistent ordering and (2) a full specification of the ordering. A consistent ordering means the server (i.e., the implementation) can choose the ordering so long as it is consistent across all queries to that server. A full specification means the SparqlSpec specifies the ordering for all possible cases. The SparqlSpec now includes rules for a full specification. General Comments 1) UC&R - Use case 2.19 in the "RDF Data Access Use Cases and Requirements" document requests sorting by number, by date, by name. Objective 4.11 states "the language should be able to express sort ordering on query results". There is no indication if the sorting/comparison should be lexical-based or value-based. 2) Collation - EricP noted (DAWG telecon 22Mar05) that the SparqlSpec (in Section 11) requires SPARQL to support the collation based on code point values when comparing strings. 3) Constraints - The SparqlSpec (Section 3.2) requires that constraints be evaluating based on the value of an expression rather than its lexical (syntactic) form, e.g., "03^^xsd:int"<"3^^xsd:int" is true in the lexical space but false is the value space. 4) Distinct - it seems desirable (I assume) that the Select result forms of "Distinct" and "Order By" should use the same rules for comparing RDF terms. In particular, if "Order By" considers "03^^xsd:int" equal to "3^^xsd:int", then "Distinct" should eliminate one as a duplicate. SparqlSpec does not explicitly state that it uses the same rules as "Order By" (should it?). 5) Functions - The SparqlSpec allows ordering based on a function call, e.g., "... Order by f(?var)" is legal (see rule 10 in the Sparql grammar). Questions 1) Lexical Space vs. Value Space - I see no requirement that we order results in the value space rather than the lexical space. Ordering in the lexical space is well-defined and can be consistent with the collation. We could adopt lexical ordering for now and, in a subsequent revision, add options to specify comparison in the value space. This is akin to the Unix sort command which treats as fields as string by default but has options to compare fields as numeric, date, etc. AndyS pointed out (private comm.) that SparqlSpec allows this already "by forcing the value space operator" "Order By xsd:integer(?var)" This looks like a function call to me although it may be processed differently. AndyS also pointed out that pure lexical has two consequences: (a) "10" < "2" (b) "10"^^xsd:byte == "10^^xsd:int (the latter is a case where a value has two lexical forms). But if a primary purpose of "Order By" is just to ensure consistent result ordering (for use with Limit/Offset), then these consequences are fine. Also, we could handle case (b) above by requiring "Order By" to use the canonical lexical form when comparing values (there is only one canonical lexical form for each value). 2) Drop Function Calls - I see no requirement to support function calls. Can they be dropped? It would simplify the initial implementation. 3) Consistent Order vs. Full Specification - if we do value-based ordering, the question of a consistent order vs. a full specification remains (Section 10.1 of the SparqlSpec). ------------------------------------------------------------------
Received on Thursday, 24 March 2005 18:47:08 UTC