Re: Undefined results order with Order By/Projection/Distinct

Thanks Andy,

I agree re: the intent, but if all variables used in the Order By
expressions are projected, it shouldn't even matter if
Distinct/Project are applied before Order By or after. In fact,
systems using dictionary encoding of triples in its internal
representation (a-la RDF-3x) often push Distinct/Projection under
Order By because Distinct can be implemented over tuples of numbers
while Order By requires comparison over IRIs and literals.

In other words, the explicit logical order of solution modifiers
defined in https://www.w3.org/TR/sparql11-query/#solutionModifiers is
only important when the projection eliminates one of orderby
variables.

Cheers,
Pavel

On Sat, Oct 8, 2022 at 1:34 PM Andy Seaborne <andy@apache.org> wrote:
>
> Hi Pavel,
>
> Hmm - that text isn't good. The intent is, I guess, when
> distinct/project are same set of expressions as the orderby.
>
> I've recorded the issue
>
> https://www.w3.org/2013/sparql-errata#errata-query-20
>
>      Andy
>
> On 07/10/2022 15:38, Pavel Klinov wrote:
> > Hi all,
> >
> > Sorry if this is known but was a little surprise to me. Even though
> > both Projection and Distinct solution modifiers are required to
> > preserve the order of solutions imposed by Order By (see 18.5), one
> > can construct an example where the final query results would be
> > undefined:
> >
> > SELECT DISTINCT ?a {
> >      VALUES (?a ?b) { (1 1) (1 2) (2 3) (1 4) (2 5) }
> > }
> > ORDER BY DESC(?b)
> >
> > Solution modifiers are applied in the order of: Order By -> Projection
> > -> Distinct (15). So after the projection, the solution sequence is:
> > ?a -> 2, ?a -> 1, ?a -> 2, ?a -> 1, ?a -> 1.
> >
> > Now, the Distinct is only required to keep this order but it's free to
> > remove any of the duplicate ?a -> 2 or ?a -> 1 solutions. So the final
> > results could be either ?a -> 2, ?a -> 1 or ?a -> 1, ?a -> 2. Note
> > that both solution sequences preserve the Order By order!
> >
> > It's easy to make an extended example with LIMIT where the results
> > could be completely different based on how Distinct eliminates
> > duplicates. Given the role preservation requirement one can argue that
> > Distinct should always keep the first occurrence of each duplicate in
> > the input, but I don't think it's in the spec.
> >
> > Am I missing something?
> > Pavel
> >
> >
>

Received on Sunday, 9 October 2022 19:40:56 UTC