# Re: Review SPARQL Query 1.1, Section 18 (algebra)

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 18 Mar 2011 17:15:49 +0000
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>
Message-Id: <0D6CA869-915A-4A30-8FEF-D6C0C28B6A2A@garlik.com>
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>, SPARQL Working Group <public-rdf-dawg@w3.org>
```On 2011-03-15, at 23:39, Steve Harris wrote:

> On 2011-03-15, at 00:34, Birte Glimm wrote:

[snip]

Picking up...

>> The definition for the evaluation of Group(...) (from aggregats) is missing.

OK, I've added one, but slightly worried it might be gibberish - I think the current definition should be removed / make into a placeholder, and the behaviour should be defined in the evaluation.

>> Definition: Evaluation of Aggregation
>> Aggregation applies a set function “func” to a multiset of lists of
>> expressions and a grouped solution sequence, G as produced by the Group
>> function. It produces a single value for each key and partition for that key
>> (key, X).
>> should be
>> Aggregation applies a set function “func” to a multiset of lists of
>> expressions and a grouped solution sequence, ***P*** (G is used in
>> D(G) for the active graph) as produced by the Group function. It
>> produces a single value for each key and partition for that key (key
>> ****->***** X).
>> M should be defined as:
>> M = { L | L = ListEval(exprlist, μ) for μ in ran(g) }
>> or
>> M = { ListEval(exprlist, μ) | μ in range(g) }
>> since ran(g) is a set of solution mappings, whereas ListEval is
>> defined for an expression list and one solution mapping. Also
>> range(function) has not been defined before although it is pretty
>> obvious, but for the domain of a solution there is a special
>> definition.

OK, yes.

>> Special Case: ... will be *the* cardinality of ...
>>
>> Can I actually use COUNT(DISTINCT *)? That doesn't seem to make much
>> sense to me.
>>
>> Note that the purpose of the expression card[range(g)] - card[M] is to
>> indicate to the set function the number of expressions that evaluated
>> to an error.

This is redundant, and I've removed it.

>> card[range(g)] is only used in COUNT(*), how does that relate to error
>> indication in general?

Error counts have been removed.

>> For example the aggregate expression GROUP_BY(?x ; separator="|")
>> should be GROUP_CONCAT
>>
>>
>> Definition: Evaluation of AggregateJoin
>> ...eval(D(G), AggregateJoin(A, P) ***)***
>> missing brace
>>
>> I would much prefer (as said above) if the temporary variables agg_i
>> were given as parameter to each A_i = Aggregation(...) since otherwise
>> the algebra object doesn't contain all the information required to
>> evaluate it. I have to know which variable name was used for each A_i,
>> since I can hardly assume it is really agg_i given that agg_i is not a
>> forbidden variable name and an evil user might use it within a
>> query.

That sounds good, but I don't know how that can be expressed formally. I spent some time trying with no success.

>> I also assume the definition is wrong, I want
>> eval(D(G), AggregateJoin(A, P))={ (agg_1, v_1), ..., (agg_n, v_n) |
>> v_i such that ( k, v_i ) in eval(D(G), A_i) for some k and each 1 <= i
>> <= n }

I think I agree that's what's in the document is wrong, but I don't follow this at the moment, I've put looking at it deeper on my todo list.

[snip]

>>
>> The second
>> Definition: Evaluation of ToList
>> should be
>> Definition: Evaluation of Subquery

I think Andy might have done this?

- Steve

--
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11