Re: Query Review Part 2

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 14 Dec 2011 17:02:43 +0000
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <15C7134D-8BB0-401C-BC6C-589695049B4D@garlik.com>

```On 14 Dec 2011, at 16:50, Birte Glimm wrote:

> Thanks for the quick response. Small comments inle.
>
> Birte
>
> [snip]
>>> 18.4.1
>>> Note that in 18.2.4.1, we get Group((1), P) if the query has an
>>> aggregate, but no GROUP BY clause. Thus, Group((1), P) should produce
>>> something meaningful in 18.4.1, but we define
>>> Group(exprlist, Ω) = { ListEval(exprlist, μ) -> { μ' | μ' in Ω,
>>> ListEval(exprlist, μ) = ListEval(exprlist, μ') } | μ in Ω }
>>> which is
>>> Group((1), Ω) = { ListEval((1), μ) -> { μ' | μ' in Ω, ListEval((1), μ)
>>> = ListEval((1), μ') } | μ in Ω }
>>> with Ω the solutions for P and
>>> ListEvalListEval((expr1, ..., exprn), μ) returns a list (e1, ..., en),
>>> where ei = μ(expri) or error.
>>> which is
>>> ListEvalListEval((1), μ) returns a list (e1), where e1 = μ(1) or error.
>>> What is μ(1) here? I don't think implicit grouping is mean to work
>>> based on errors. All in all, I don't understand how implicit grouping
>>> is supposed to work.
>>
>> Yes, I see. I'm not particularly happy with the notation of μ(expr) used to denote evaluating expr w.r.t. μ. Previously I used •, as in μ • expr but that didn't work for some people.
>>
>> Note, this isn't specifically an issue for implicit grouping, it applies on any grouping with only constants, though those would be odd use cases, e.g. GROUP BY 23, or GROUP BY NOW() will end up in the same place.
>
> Ok, at least I think I get the idea now. Basically, 1 is a randomly
> chosen value that will not be afftected by applying μ to it, i.e.,
> μ(1) = 1. As a result, we get all solutions in a single group with
> (random) key 1, right?

Yes, that's right.

> I think some explanatory text is needed here to at least explain the
> intuition. Once you get the intuition across, it is much easier to
> read the definition. Indeed the notation μ(expr) is a bit confusing
> here, but I don't think we want to change that throughout the spec at
> this point and I guess a comment will do. How about extending the
> explanation for Step: GROUP BY? For example:
> If the GROUP BY keyword is used, or there is implicit grouping due to
> the use of aggregates in the projection, then grouping is performed by
> the Group function. It divides the solution set into groups of one or
> more solutions, with the same overall cardinality. In case of implicit
> grouping, a fixed constant (1) is used to group all solutions into a
> single group.

OK, that seems good, I've added it.

>> I don't really have a sensible suggestion for how to fix this, the best I can think of is some explanatory text saying what μ(expr) is trying to express.
>>
>>> Definition: Aggregation
>>> Aggregation, a function which calculates a scalar value as an output
>>> of the aggregate expression in the SELECT clause, in the HAVING
>>> evaluation process, and in ORDER BY where required is used to
>>> calculate aggregated values over groups of solutions.
>>> I don't understand "where required is used to calculate aggregated
>>> values over groups of solutions."
>>
>> Yes, that's messed up grammar. Maybe:
>>
>> "Aggregation, a function which calculates a scalar value as an output of the aggregate expression. It is used  in the SELECT clause, the HAVING evaluation process, and in ORDER BY (where required). Aggregation calculates aggregated values over groups of solutions, using set functions."
>
> +1
>
>>> The main part of the definition is still in 18.5 and, to be consistent
>>> with the rest, should be moved to 18.4.1. For example, in 18.4.1, you
>>> could have the def. from 18.5 with slight modifications:
>>
>> OK, I was scared of moving the definition from evaluation of… to def'n of… incase it altered the semantics.
>
> I think it has to be moved. I never understood why I have to go to the
> eval section for aggregates, but for all other things the important
> parts are in 18.4 and the evaluation only delegates to the defined

Yeah, it was an early error on my part, anyway it's moved now.

>>> Definition: Aggregation
>>> Let exprlist be a list of expressions or *, func an aggregate,
>>> scalarvals a set of partial functions (possibly empty) passed from the
>>> aggregate in the query, and let { key_1->Omega_1, ..., key_m->Omega_m
>>> } be a multiset of partial functions from keys to solution sequences
>>> as produced by the grouping step. Aggregation applies the set function
>>> func to the given multiset and produces a single value for each key
>>> and partition of solutions for that key.
>>> Aggregation( exprlist, func, scalarvals, { key_1->Omega_1, ...,
>>> key_m->Omega_m } )
>>> = { (dom(g), F) | g in { key_1->Omega_1, ..., key_m->Omega_m } }
>>> where
>>>   M = { ListEval(exprlist, μ) | μ in range(g) }
>>>   F = func(M, scalarvals), for non-DISTINCT
>>>   F = func(Distinct(M), scalarvals), for DISTINCT
>>> Special Case: when COUNT is used with the expression * the value of F
>>> will be the cardinality of the group solution sequence,
>>> card[range(g)], or card[Distinct(range(g))] if the DISTINCT keyword is
>>> present.
>>
>> Thanks, I have made that edit, can you check I lost nothing in translation? I changed "func an aggregate" to "func a set function", which I believe is correct at this point in the process.
>
> I aggree that set function is more appropriate.
>
> The definition looks good, just the special case still hase some
> range(g) left, which should now be Ω:
> Special Case: ... card[Ω], or card[Distinct(Ω)] if …

Thanks, I missed that one. Fixed.

- Steve

--
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11