Re: Query Review Part 2 from Birte Glimm on 2011-12-14 (public-rdf-dawg@w3.org from October to December 2011)

From: Birte Glimm <birte.glimm@uni-ulm.de>
Date: Wed, 14 Dec 2011 17:50:45 +0100
To: Steve Harris <steve.harris@garlik.com>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <CABt65OeegwZtJ7OKkU189F_0v+znyNAw1VwnpCptdQyxfp10xw@mail.gmail.com>
Thanks for the quick response. Small comments inle.

Birte

[snip]
>> 18.4.1
>> Note that in 18.2.4.1, we get Group((1), P) if the query has an
>> aggregate, but no GROUP BY clause. Thus, Group((1), P) should produce
>> something meaningful in 18.4.1, but we define
>> Group(exprlist, Ω) = { ListEval(exprlist, μ) -> { μ' | μ' in Ω,
>> ListEval(exprlist, μ) = ListEval(exprlist, μ') } | μ in Ω }
>> which is
>> Group((1), Ω) = { ListEval((1), μ) -> { μ' | μ' in Ω, ListEval((1), μ)
>> = ListEval((1), μ') } | μ in Ω }
>> with Ω the solutions for P and
>> ListEvalListEval((expr1, ..., exprn), μ) returns a list (e1, ..., en),
>> where ei = μ(expri) or error.
>> which is
>> ListEvalListEval((1), μ) returns a list (e1), where e1 = μ(1) or error.
>> What is μ(1) here? I don't think implicit grouping is mean to work
>> based on errors. All in all, I don't understand how implicit grouping
>> is supposed to work.
>
> Yes, I see. I'm not particularly happy with the notation of μ(expr) used to denote evaluating expr w.r.t. μ. Previously I used •, as in μ • expr but that didn't work for some people.
>
> Note, this isn't specifically an issue for implicit grouping, it applies on any grouping with only constants, though those would be odd use cases, e.g. GROUP BY 23, or GROUP BY NOW() will end up in the same place.

Ok, at least I think I get the idea now. Basically, 1 is a randomly
chosen value that will not be afftected by applying μ to it, i.e.,
μ(1) = 1. As a result, we get all solutions in a single group with
(random) key 1, right?

I think some explanatory text is needed here to at least explain the
intuition. Once you get the intuition across, it is much easier to
read the definition. Indeed the notation μ(expr) is a bit confusing
here, but I don't think we want to change that throughout the spec at
this point and I guess a comment will do. How about extending the
explanation for Step: GROUP BY? For example:
If the GROUP BY keyword is used, or there is implicit grouping due to
the use of aggregates in the projection, then grouping is performed by
the Group function. It divides the solution set into groups of one or
more solutions, with the same overall cardinality. In case of implicit
grouping, a fixed constant (1) is used to group all solutions into a
single group.

> I don't really have a sensible suggestion for how to fix this, the best I can think of is some explanatory text saying what μ(expr) is trying to express.
>
>> Definition: Aggregation
>> Aggregation, a function which calculates a scalar value as an output
>> of the aggregate expression in the SELECT clause, in the HAVING
>> evaluation process, and in ORDER BY where required is used to
>> calculate aggregated values over groups of solutions.
>> I don't understand "where required is used to calculate aggregated
>> values over groups of solutions."
>
> Yes, that's messed up grammar. Maybe:
>
> "Aggregation, a function which calculates a scalar value as an output of the aggregate expression. It is used  in the SELECT clause, the HAVING evaluation process, and in ORDER BY (where required). Aggregation calculates aggregated values over groups of solutions, using set functions."

+1

>> The main part of the definition is still in 18.5 and, to be consistent
>> with the rest, should be moved to 18.4.1. For example, in 18.4.1, you
>> could have the def. from 18.5 with slight modifications:
>
> OK, I was scared of moving the definition from evaluation of… to def'n of… incase it altered the semantics.

I think it has to be moved. I never understood why I have to go to the
eval section for aggregates, but for all other things the important
parts are in 18.4 and the evaluation only delegates to the defined
functions. Andy also complained recently about this...

>> Definition: Aggregation
>> Let exprlist be a list of expressions or *, func an aggregate,
>> scalarvals a set of partial functions (possibly empty) passed from the
>> aggregate in the query, and let { key_1->Omega_1, ..., key_m->Omega_m
>> } be a multiset of partial functions from keys to solution sequences
>> as produced by the grouping step. Aggregation applies the set function
>> func to the given multiset and produces a single value for each key
>> and partition of solutions for that key.
>> Aggregation( exprlist, func, scalarvals, { key_1->Omega_1, ...,
>> key_m->Omega_m } )
>> = { (dom(g), F) | g in { key_1->Omega_1, ..., key_m->Omega_m } }
>> where
>>   M = { ListEval(exprlist, μ) | μ in range(g) }
>>   F = func(M, scalarvals), for non-DISTINCT
>>   F = func(Distinct(M), scalarvals), for DISTINCT
>> Special Case: when COUNT is used with the expression * the value of F
>> will be the cardinality of the group solution sequence,
>> card[range(g)], or card[Distinct(range(g))] if the DISTINCT keyword is
>> present.
>
> Thanks, I have made that edit, can you check I lost nothing in translation? I changed "func an aggregate" to "func a set function", which I believe is correct at this point in the process.

I aggree that set function is more appropriate.

The definition looks good, just the special case still hase some
range(g) left, which should now be Ω:
Special Case: ... card[Ω], or card[Distinct(Ω)] if ...
Received on Wednesday, 14 December 2011 16:51:16 UTC