Re: more sorting test cases from Jeen Broekstra on 2005-05-03 (public-rdf-dawg@w3.org from April to June 2005)

From: Jeen Broekstra <jeen@aduna.biz>
Date: Tue, 03 May 2005 12:35:41 +0200
To: andy.seaborne@hp.com
Cc: public-rdf-dawg@w3.org
Message-ID: <427753FD.6070608@aduna.biz>
Seaborne, Andy wrote:

[snip]

> Good - but I'm still unclear.  It is possible to have a test with one of 
> each kind in the results and get a defined ordering.

Yes. For example test case 3 does this: it contains a single unbound 
(and several ordered bound values) in the result.

I'll include a few more, similar cases later, hopefully today.

(By the way, the sorting test cases can be found at
  http://www.w3.org/2001/sw/DataAccess/tests/data/sort/. Sorry for
  forgetting the ref earlier).

[snip]

>> Related to that, I think the spec is not completely clear on one 
>> notion that we might want to be a bit more precise on. Suppose we have 
>> two datatyped literals of incomparable types, for example one is a 
>> xsd:float and the other a xsd:boolean. The spec says that for such 
>> cases order is undefined. Fine. Now suppose we have _two_ floats and a 
>> boolean:
>>
>> "1E2"^^xsd:float
>> "1E4"^^xsd:float
>> "true"^^xsd:boolean
>>
>> The spec only says that for incomparable types the result order is not 
>> defined. So strictly speaking, the spec not only endorses the above 
>> sorting, but also this:
>>
>> "1E2"^^xsd:float
>> "true"^^xsd:boolean
>> "1E4"^^xsd:float
>>
>> This is a rather weird and unnatural way of implementing sorting of 
>> course, but strictly speaking legal I think. Do we want to reword to 
>> prevent this?
> 
> 
> What wording do you suggest? 
 >
> If it is to cluster things of the same type, I think this gets into problems 
 > with subtypes

Good point. Hm. I'd say that we could word it such that we indicate 
that a sequence of ordered values should not be broken by mixing it 
with unordered values. So it's not so much worded in terms of the 
datatype itself but in terms of the resulting order.

> and with processors 
> that do know how to handle them, and also with unknown types over the 
> same value space:
> 
> Suppose a processor does not know about doubles and floats: clustering 
> gives:
> 
> "1E2"^^float
> "1E4"^^float
> "1E3"^^double
> 
> But knowing the values gives:
> 
> "1E2"^^float
> "1E3"^^double
> "1E4"^^float

I am not quite sure how it is relevant whether or not a processor does 
  or does not know the datatype.

I would think that sorting is defined on the assumption that the 
datatypes in question are understood by the processor - and if they 
are not, it will probably just treat them as untyped literals and do a 
lexical sort, which will produce an incorrect sorting. Problem of the 
implementor, not of the spec, I'd say: not (fully) supporting the 
datatypes specified in the SPARQL QL spec equals not (fully) 
supporting ORDER BY.

This is no different than the definition of comparison operators, 
which are also defined under the assumption that the relevant 
datatypes are understood by the processor.

> One minor thing came up: The data files and some queries still end 
> \r\r\n (after checking out on windows).  Doesn't break anything.  Have 
> they been checked in UNIX style with \r\n?

I do try to check in UNIX-style linebreaks only, but something may 
have gone wrong. I'll doublecheck.

Jeen
-- 
Jeen Broekstra          Aduna BV
Knowledge Engineer      Julianaplein 14b, 3817 CS Amersfoort
http://aduna.biz        The Netherlands
tel. +31 33 46599877
Received on Tuesday, 3 May 2005 10:33:41 UTC