- From: Axel Polleres <axel.polleres@deri.org>
- Date: Mon, 29 Mar 2010 23:34:00 +0100
- To: Andy Seaborne <andy.seaborne@talis.com>
- Cc: "SPARQL Working Group" <public-rdf-dawg@w3.org>
Thanks Andy, I anyway see things much clearer now after the f2f discussions! best, Axel On 29 Mar 2010, at 23:07, Andy Seaborne wrote: > > > On 26/03/2010 10:18 AM, Axel Polleres wrote: > > > > On 26 Mar 2010, at 09:57, Axel Polleres wrote: > > > >> short clarification request: > >> > >>> { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined } > >> > >> by "is defined" you mean "is unequal to 'error'", yes? > > > > p.s.: > > > > or do you mean such that μ(expr) is defined ? > > μ is a substitution function (it's mapping from vars to terms). > > μ(expr) is the rewrite of the expression with variables replaced by any > defined values. > > eval(μ(expr)) is the value of that (and errors for any unbound variables > - they didn't get substitued and variables aren't RDF terms). > > Andy > > > > > > > I think I get the intention... to be able to treat unbound different from error, yes? > > > > > > here my understanding of the proposal. Say we have: > > > > ?X ?Y > > Does not make sense later on: > > Lets' try : > > > ?Z ?X > > ----- > > a 1 > > b 0 > > c > > d "bla" > > e 1 > > > > Here my understanding of the proposal: > > > > COUNT( * ) -> 5 > > Agreed > > > COUNT( ?X ) -> 4 > > Agreed after ?X moved to right-hand column. > > > > COUNT( DISTINCT ?X ) -> 3 > > Agreed. > > > > > yes? if so, clear so far. now what about expressions? > > > > COUNT( ?X * ?X || ?X * ?X ) -> ? > > """ SPARQL 1.0: > Note that logical-or operates on the effective boolean value of its > arguments. > """ > > So 3 as ?X*?X is only defined for 1,0,1 > > 1||1 => true > 0||0 => false > > > COUNT( DISTINCT (?X * ?X || ?X * ?X) ) -> ? > > 2 (true and false of ||) > > > > > concretely, what happens to the "bla" row that produces an error? what happens to the unbound row, that also producse an error when the expression is evaluated? > > Same - eval(?x) is undefined as is eval("bla"*"bla") > > > > > Thanks, > > Axel > > Andy > > > > >> > >> What I mean to ask here... when I read the current section on Filter evaluation > >> http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#evaluation > >> to me it seems that eval is always "defined", but it could be an error for reasons of mistyping or > >> values being unbound. > >> > >> Thanks for clarification on whether/what I might have overlooked here! > >> > >> Axel > >> > >> On 7 Mar 2010, at 17:42, Andy Seaborne wrote: > >> > >>> ISSUE-53 > >>> > >>> I propose the following to define ExprMultiSet: > >>> > >>> ------- > >>> > >>> Let Ω be a partition. > >>> > >>> ExprMultiSet(Ω) = > >>> { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined } > >>> UNION > >>> { e | μ in Ω such that eval(μ(expr)) is undefined } > >>> > >>> where "e" is some symbol that is distinct from all RDF terms. > >>> > >>> card[x]: > >>> if DISTINCT: > >>> card[x] = 1 if there exists μ in Ω such that x = eval(μ(expr)) > >>> card[x] = 0 otherwise > >>> else > >>> card[x] = count of μ in Ω such that x = eval(μ(expr)) > >>> > >>> -------- > >>> > >>> "e" just records error evaluations. > >>> > >>> This is the most flexible definition. An alternative is > >>> > >>> ExprMultiset(Ω) = > >>> { eval(expr,μ) | μ in Ω such that eval(expr,μ) is defined } > >>> > >>> which is hard-coding dropping errors and unbounds during evaluation. But > >>> the aggregate can't know there were some errors. > >>> > >>> Another possibility is that a yes/no flag indicating a error was seen. > >>> But this might as well be the count of errors, which is equivalent to > >>> the flexible definition given. > >>> > >>> By the way, this is in no way a recipe for implementation. Aggregation > >>> can be done over all groups in parallel during query execution. > >>> > >>> > >>> > >>> For the last publication, it was noted > >>> > >>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009OctDec/0646.html > >>> > >>> Unbound and error are the same. The current design so far has it that > >>> any error means that the multiset is invalid and that group is not > >>> considered. > >>> > >>> We didn't have time to propose a solid design to address ISSUE-53 - the > >>> potential design at the time of publication was that any error when > >>> calculating the ExprMultiset from a partition meant that > >>> > >>> SUM of {1, 2, unbound} is an error. > >>> COUNT of {1, 2, unbound} is an error. > >>> > >>> I don't think that is a useful form for COUNT(?x). It does seem to mean > >>> that COUNT(?x) is either COUNT(*) or error; it can't be anything else. > >>> > >>> COUNT(?x) can not be zero because zero arises when there are no ?x but > >>> there are solutions in the partition. If there are no solutions in the > >>> partition then there is no group key and no grouping happens. > >>> > >>> For each aggregate we can decide what happens about unbounds and errors. > >>> > >>> I would like to see: > >>> > >>> COUNT(*) = size of multiset. > >>> COUNT(DISTINCT *) = size of set after removing any e (i.e. skip undefs). > >>> > >>> COUNT(?x) = number of times ?x is defined in each group > >>> 0<= COUNT(?x)<= COUNT(*) > >>> > >>> COUNT(DISTINCT ?x) = number of times ?x is uniquely defined in each group > >>> > >>> I'm less worried about SUM(?x) but I'd prefer that > >>> > >>> SUM(?x) = op:numeric-add of defined values of ?x, skips unbounds > >>> > >>> rather that the rigid form we currently have. > >>> > >>> Previously, one of the difficulties raised for this design was that the > >>> operation to add two numbers wasn't op:numeric-add because that could > >>> not cope the errors (there were related datatyping issues as well). > >>> > >>> With the definition of ExprMultiSet above, op:numeric-add can be used to > >>> define SUM. There is step between getting the ExprMultiSet and the > >>> calculation of aggregation. This step, for SUM (and COUNT(?x)), removes > >>> any errors. > >>> > >>> GROUP_CONCAT(?x) = concatenation > >>> and now GROUP_CONCAT of an empty set can be defined as "". > >>> > >>> ------------- > >>> Some examples: > >>> > >>> Does anyone want to suggest we design to get different results in any of > >>> these cases? > >>> > >>> > >>> --Data: > >>> > >>> @prefix :<http://example/> . > >>> > >>> :x1 a :T . > >>> :x1 :p 1 . > >>> :x1 :p 2 . > >>> > >>> :x2 a :T . > >>> :x2 :p 9 . > >>> > >>> :x3 a :T . > >>> :x3 :p 5 . > >>> :x3 :q "x" . > >>> > >>> :x4 a :T . > >>> :x4 :q "z". > >>> > >>> > >>> -- > >>> > >>> > >>> -- Query 1: > >>> 1 PREFIX :<http://example/> > >>> 2 > >>> 3 SELECT ?x (count(*) AS ?C) > >>> 4 WHERE > >>> 5 { ?x<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> :T > >>> 6 OPTIONAL > >>> 7 { ?x :p ?v} > >>> 8 } > >>> 9 GROUP BY ?x > >>> 10 ORDER BY str(?x) > >>> > >>> ----------- > >>> | x | C | > >>> =========== > >>> | :x1 | 2 | > >>> | :x2 | 1 | > >>> | :x3 | 1 | > >>> | :x4 | 1 | > >>> ----------- > >>> > >>> -- Query 2: > >>> > >>> Change line 3 to: > >>> SELECT ?x (count(?v) AS ?C) > >>> > >>> ----------- > >>> | x | C | > >>> =========== > >>> | :x1 | 2 | > >>> | :x2 | 1 | > >>> | :x3 | 1 | > >>> | :x4 | 0 | > >>> ----------- > >>> > >>> -- Query 3: > >>> > >>> Change line 3 to: > >>> SELECT ?x (sum(?v) AS ?C) > >>> > >>> ----------- > >>> | x | C | > >>> =========== > >>> | :x1 | 3 | > >>> | :x2 | 9 | > >>> | :x3 | 5 | > >>> | :x4 | 0 | > >>> ----------- > >>> > >>> The :x4 row is zero because there were no valid numbers to add together. > >>> > >>> -- Different query OPTIONAL part - now has ?p > >>> > >>> 1 PREFIX :<http://example/> > >>> 2 > >>> 3 SELECT ?x (sum(?v) AS ?C) > >>> 4 WHERE > >>> 5 { ?x<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> :T > >>> 6 OPTIONAL > >>> 7 { ?x ?any ?v} > >>> 8 } > >>> 9 GROUP BY ?x > >>> 10 ORDER BY str(?x) > >>> > >>> ----------- > >>> | x | C | > >>> =========== > >>> | :x1 | 3 | > >>> | :x2 | 9 | > >>> | :x3 | 5 | > >>> | :x4 | 0 | > >>> ----------- > >>> > >>> The case where ?v is "Z2 and "x" have been skipped. > >>> > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> > >> > > >
Received on Monday, 29 March 2010 22:34:36 UTC