From: Axel Polleres <axel.polleres@deri.org>

Date: Fri, 26 Mar 2010 10:18:32 +0000

Cc: Axel Polleres <axel.polleres@deri.org>

Message-Id: <DED83A5C-6C5B-45C7-89F0-7BC86F7B890E@deri.org>

To: Andy Seaborne <andy.seaborne@talis.com>, SPARQL Working Group <public-rdf-dawg@w3.org>

Date: Fri, 26 Mar 2010 10:18:32 +0000

Cc: Axel Polleres <axel.polleres@deri.org>

Message-Id: <DED83A5C-6C5B-45C7-89F0-7BC86F7B890E@deri.org>

To: Andy Seaborne <andy.seaborne@talis.com>, SPARQL Working Group <public-rdf-dawg@w3.org>

On 26 Mar 2010, at 09:57, Axel Polleres wrote: > short clarification request: > >> { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined } > > by "is defined" you mean "is unequal to 'error'", yes? p.s.: or do you mean such that μ(expr) is defined ? I think I get the intention... to be able to treat unbound different from error, yes? here my understanding of the proposal. Say we have: ?X ?Y ----- a 1 b 0 c d "bla" e 1 Here my understanding of the proposal: COUNT( * ) -> 5 COUNT( ?X ) -> 4 COUNT( DISTINCT ?X ) -> 3 yes? if so, clear so far. now what about expressions? COUNT( ?X * ?X || ?X * ?X ) -> ? COUNT( DISTINCT (?X * ?X || ?X * ?X) ) -> ? concretely, what happens to the "bla" row that produces an error? what happens to the unbound row, that also producse an error when the expression is evaluated? Thanks, Axel > > What I mean to ask here... when I read the current section on Filter evaluation > http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#evaluation > to me it seems that eval is always "defined", but it could be an error for reasons of mistyping or > values being unbound. > > Thanks for clarification on whether/what I might have overlooked here! > > Axel > > On 7 Mar 2010, at 17:42, Andy Seaborne wrote: > >> ISSUE-53 >> >> I propose the following to define ExprMultiSet: >> >> ------- >> >> Let Ω be a partition. >> >> ExprMultiSet(Ω) = >> { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined } >> UNION >> { e | μ in Ω such that eval(μ(expr)) is undefined } >> >> where "e" is some symbol that is distinct from all RDF terms. >> >> card[x]: >> if DISTINCT: >> card[x] = 1 if there exists μ in Ω such that x = eval(μ(expr)) >> card[x] = 0 otherwise >> else >> card[x] = count of μ in Ω such that x = eval(μ(expr)) >> >> -------- >> >> "e" just records error evaluations. >> >> This is the most flexible definition. An alternative is >> >> ExprMultiset(Ω) = >> { eval(expr,μ) | μ in Ω such that eval(expr,μ) is defined } >> >> which is hard-coding dropping errors and unbounds during evaluation. But >> the aggregate can't know there were some errors. >> >> Another possibility is that a yes/no flag indicating a error was seen. >> But this might as well be the count of errors, which is equivalent to >> the flexible definition given. >> >> By the way, this is in no way a recipe for implementation. Aggregation >> can be done over all groups in parallel during query execution. >> >> >> >> For the last publication, it was noted >> >> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009OctDec/0646.html >> >> Unbound and error are the same. The current design so far has it that >> any error means that the multiset is invalid and that group is not >> considered. >> >> We didn't have time to propose a solid design to address ISSUE-53 - the >> potential design at the time of publication was that any error when >> calculating the ExprMultiset from a partition meant that >> >> SUM of {1, 2, unbound} is an error. >> COUNT of {1, 2, unbound} is an error. >> >> I don't think that is a useful form for COUNT(?x). It does seem to mean >> that COUNT(?x) is either COUNT(*) or error; it can't be anything else. >> >> COUNT(?x) can not be zero because zero arises when there are no ?x but >> there are solutions in the partition. If there are no solutions in the >> partition then there is no group key and no grouping happens. >> >> For each aggregate we can decide what happens about unbounds and errors. >> >> I would like to see: >> >> COUNT(*) = size of multiset. >> COUNT(DISTINCT *) = size of set after removing any e (i.e. skip undefs). >> >> COUNT(?x) = number of times ?x is defined in each group >> 0 <= COUNT(?x) <= COUNT(*) >> >> COUNT(DISTINCT ?x) = number of times ?x is uniquely defined in each group >> >> I'm less worried about SUM(?x) but I'd prefer that >> >> SUM(?x) = op:numeric-add of defined values of ?x, skips unbounds >> >> rather that the rigid form we currently have. >> >> Previously, one of the difficulties raised for this design was that the >> operation to add two numbers wasn't op:numeric-add because that could >> not cope the errors (there were related datatyping issues as well). >> >> With the definition of ExprMultiSet above, op:numeric-add can be used to >> define SUM. There is step between getting the ExprMultiSet and the >> calculation of aggregation. This step, for SUM (and COUNT(?x)), removes >> any errors. >> >> GROUP_CONCAT(?x) = concatenation >> and now GROUP_CONCAT of an empty set can be defined as "". >> >> ------------- >> Some examples: >> >> Does anyone want to suggest we design to get different results in any of >> these cases? >> >> >> --Data: >> >> @prefix : <http://example/> . >> >> :x1 a :T . >> :x1 :p 1 . >> :x1 :p 2 . >> >> :x2 a :T . >> :x2 :p 9 . >> >> :x3 a :T . >> :x3 :p 5 . >> :x3 :q "x" . >> >> :x4 a :T . >> :x4 :q "z". >> >> >> -- >> >> >> -- Query 1: >> 1 PREFIX : <http://example/> >> 2 >> 3 SELECT ?x (count(*) AS ?C) >> 4 WHERE >> 5 { ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> :T >> 6 OPTIONAL >> 7 { ?x :p ?v} >> 8 } >> 9 GROUP BY ?x >> 10 ORDER BY str(?x) >> >> ----------- >> | x | C | >> =========== >> | :x1 | 2 | >> | :x2 | 1 | >> | :x3 | 1 | >> | :x4 | 1 | >> ----------- >> >> -- Query 2: >> >> Change line 3 to: >> SELECT ?x (count(?v) AS ?C) >> >> ----------- >> | x | C | >> =========== >> | :x1 | 2 | >> | :x2 | 1 | >> | :x3 | 1 | >> | :x4 | 0 | >> ----------- >> >> -- Query 3: >> >> Change line 3 to: >> SELECT ?x (sum(?v) AS ?C) >> >> ----------- >> | x | C | >> =========== >> | :x1 | 3 | >> | :x2 | 9 | >> | :x3 | 5 | >> | :x4 | 0 | >> ----------- >> >> The :x4 row is zero because there were no valid numbers to add together. >> >> -- Different query OPTIONAL part - now has ?p >> >> 1 PREFIX : <http://example/> >> 2 >> 3 SELECT ?x (sum(?v) AS ?C) >> 4 WHERE >> 5 { ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> :T >> 6 OPTIONAL >> 7 { ?x ?any ?v} >> 8 } >> 9 GROUP BY ?x >> 10 ORDER BY str(?x) >> >> ----------- >> | x | C | >> =========== >> | :x1 | 3 | >> | :x2 | 9 | >> | :x3 | 5 | >> | :x4 | 0 | >> ----------- >> >> The case where ?v is "Z2 and "x" have been skipped. >> >> Andy >> >> >> >> >> >> >Received on Friday, 26 March 2010 10:19:10 GMT

*
This archive was generated by hypermail 2.3.1
: Tuesday, 26 March 2013 16:15:42 GMT
*