Re: Undefined results order with Order By/Projection/Distinct

Hi Miguel,

Yeah, I agree. Actually this issue was known (long?) before SPARQL,
here's a great writeup in the SQL context:

https://blog.jooq.org/how-sql-distinct-and-order-by-are-related/

Cheers,
Pavel

On Sat, Oct 8, 2022 at 2:01 PM Miguel <miguel.ceriani@gmail.com> wrote:
>
> Hi,
> I noticed this issue some time ago too and tested that different engines give different results.
> I do not think this is a simple matter of wording (consider that ordering by a variable that is not projected makes perfectly sense, the issue arises only when also DISTINCT is used).
>
> Miguel
>
> On Sat, 8 Oct 2022, 13:34 Andy Seaborne, <andy@apache.org> wrote:
>>
>> Hi Pavel,
>>
>> Hmm - that text isn't good. The intent is, I guess, when
>> distinct/project are same set of expressions as the orderby.
>>
>> I've recorded the issue
>>
>> https://www.w3.org/2013/sparql-errata#errata-query-20
>>
>>      Andy
>>
>> On 07/10/2022 15:38, Pavel Klinov wrote:
>> > Hi all,
>> >
>> > Sorry if this is known but was a little surprise to me. Even though
>> > both Projection and Distinct solution modifiers are required to
>> > preserve the order of solutions imposed by Order By (see 18.5), one
>> > can construct an example where the final query results would be
>> > undefined:
>> >
>> > SELECT DISTINCT ?a {
>> >      VALUES (?a ?b) { (1 1) (1 2) (2 3) (1 4) (2 5) }
>> > }
>> > ORDER BY DESC(?b)
>> >
>> > Solution modifiers are applied in the order of: Order By -> Projection
>> > -> Distinct (15). So after the projection, the solution sequence is:
>> > ?a -> 2, ?a -> 1, ?a -> 2, ?a -> 1, ?a -> 1.
>> >
>> > Now, the Distinct is only required to keep this order but it's free to
>> > remove any of the duplicate ?a -> 2 or ?a -> 1 solutions. So the final
>> > results could be either ?a -> 2, ?a -> 1 or ?a -> 1, ?a -> 2. Note
>> > that both solution sequences preserve the Order By order!
>> >
>> > It's easy to make an extended example with LIMIT where the results
>> > could be completely different based on how Distinct eliminates
>> > duplicates. Given the role preservation requirement one can argue that
>> > Distinct should always keep the first occurrence of each duplicate in
>> > the input, but I don't think it's in the spec.
>> >
>> > Am I missing something?
>> > Pavel
>> >
>> >
>>

Received on Sunday, 9 October 2022 19:43:29 UTC