- From: Andy Seaborne <andy.seaborne@talis.com>
- Date: Sun, 07 Mar 2010 17:42:08 +0000
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
ISSUE-53 I propose the following to define ExprMultiSet: ------- Let Ω be a partition. ExprMultiSet(Ω) = { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined } UNION { e | μ in Ω such that eval(μ(expr)) is undefined } where "e" is some symbol that is distinct from all RDF terms. card[x]: if DISTINCT: card[x] = 1 if there exists μ in Ω such that x = eval(μ(expr)) card[x] = 0 otherwise else card[x] = count of μ in Ω such that x = eval(μ(expr)) -------- "e" just records error evaluations. This is the most flexible definition. An alternative is ExprMultiset(Ω) = { eval(expr,μ) | μ in Ω such that eval(expr,μ) is defined } which is hard-coding dropping errors and unbounds during evaluation. But the aggregate can't know there were some errors. Another possibility is that a yes/no flag indicating a error was seen. But this might as well be the count of errors, which is equivalent to the flexible definition given. By the way, this is in no way a recipe for implementation. Aggregation can be done over all groups in parallel during query execution. For the last publication, it was noted http://lists.w3.org/Archives/Public/public-rdf-dawg/2009OctDec/0646.html Unbound and error are the same. The current design so far has it that any error means that the multiset is invalid and that group is not considered. We didn't have time to propose a solid design to address ISSUE-53 - the potential design at the time of publication was that any error when calculating the ExprMultiset from a partition meant that SUM of {1, 2, unbound} is an error. COUNT of {1, 2, unbound} is an error. I don't think that is a useful form for COUNT(?x). It does seem to mean that COUNT(?x) is either COUNT(*) or error; it can't be anything else. COUNT(?x) can not be zero because zero arises when there are no ?x but there are solutions in the partition. If there are no solutions in the partition then there is no group key and no grouping happens. For each aggregate we can decide what happens about unbounds and errors. I would like to see: COUNT(*) = size of multiset. COUNT(DISTINCT *) = size of set after removing any e (i.e. skip undefs). COUNT(?x) = number of times ?x is defined in each group 0 <= COUNT(?x) <= COUNT(*) COUNT(DISTINCT ?x) = number of times ?x is uniquely defined in each group I'm less worried about SUM(?x) but I'd prefer that SUM(?x) = op:numeric-add of defined values of ?x, skips unbounds rather that the rigid form we currently have. Previously, one of the difficulties raised for this design was that the operation to add two numbers wasn't op:numeric-add because that could not cope the errors (there were related datatyping issues as well). With the definition of ExprMultiSet above, op:numeric-add can be used to define SUM. There is step between getting the ExprMultiSet and the calculation of aggregation. This step, for SUM (and COUNT(?x)), removes any errors. GROUP_CONCAT(?x) = concatenation and now GROUP_CONCAT of an empty set can be defined as "". ------------- Some examples: Does anyone want to suggest we design to get different results in any of these cases? --Data: @prefix : <http://example/> . :x1 a :T . :x1 :p 1 . :x1 :p 2 . :x2 a :T . :x2 :p 9 . :x3 a :T . :x3 :p 5 . :x3 :q "x" . :x4 a :T . :x4 :q "z". -- -- Query 1: 1 PREFIX : <http://example/> 2 3 SELECT ?x (count(*) AS ?C) 4 WHERE 5 { ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> :T 6 OPTIONAL 7 { ?x :p ?v} 8 } 9 GROUP BY ?x 10 ORDER BY str(?x) ----------- | x | C | =========== | :x1 | 2 | | :x2 | 1 | | :x3 | 1 | | :x4 | 1 | ----------- -- Query 2: Change line 3 to: SELECT ?x (count(?v) AS ?C) ----------- | x | C | =========== | :x1 | 2 | | :x2 | 1 | | :x3 | 1 | | :x4 | 0 | ----------- -- Query 3: Change line 3 to: SELECT ?x (sum(?v) AS ?C) ----------- | x | C | =========== | :x1 | 3 | | :x2 | 9 | | :x3 | 5 | | :x4 | 0 | ----------- The :x4 row is zero because there were no valid numbers to add together. -- Different query OPTIONAL part - now has ?p 1 PREFIX : <http://example/> 2 3 SELECT ?x (sum(?v) AS ?C) 4 WHERE 5 { ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> :T 6 OPTIONAL 7 { ?x ?any ?v} 8 } 9 GROUP BY ?x 10 ORDER BY str(?x) ----------- | x | C | =========== | :x1 | 3 | | :x2 | 9 | | :x3 | 5 | | :x4 | 0 | ----------- The case where ?v is "Z2 and "x" have been skipped. Andy
Received on Sunday, 7 March 2010 17:42:43 UTC