Re: Proposed definition of ExprMultiSet from Axel Polleres on 2010-03-26 (public-rdf-dawg@w3.org from January to March 2010)

From: Axel Polleres <axel.polleres@deri.org>
Date: Fri, 26 Mar 2010 09:57:16 +0000
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <662A9EB1-AEC2-4CEA-A415-65453E4214EF@deri.org>
short clarification request:

> { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined }

by "is defined" you mean "is unequal to 'error'", yes?

What I mean to ask here... when I read the current section on Filter evaluation 
http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#evaluation
to me it seems that eval is always "defined", but it could be an error for reasons of mistyping or
values being unbound.

Thanks for clarification on whether/what I might have overlooked here!

Axel

On 7 Mar 2010, at 17:42, Andy Seaborne wrote:

> ISSUE-53
> 
> I propose the following to define ExprMultiSet:
> 
> -------
> 
> Let Ω be a partition.
> 
> ExprMultiSet(Ω) =
>    { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined }
>    UNION
>    { e | μ in Ω such that  eval(μ(expr)) is undefined }
> 
> where "e" is some symbol that is distinct from all RDF terms.
> 
> card[x]:
>    if DISTINCT:
>       card[x] = 1 if there exists μ in Ω such that x =  eval(μ(expr))
>       card[x] = 0 otherwise
>    else
>       card[x] = count of μ in Ω such that x =  eval(μ(expr))
> 
> --------
> 
> "e" just records error evaluations.
> 
> This is the most flexible definition. An alternative is
> 
> ExprMultiset(Ω) =
>    { eval(expr,μ) | μ in Ω such that eval(expr,μ) is defined }
> 
> which is hard-coding dropping errors and unbounds during evaluation. But
> the aggregate can't know there were some errors.
> 
> Another possibility is that a yes/no flag indicating a error was seen.
> But this might as well be the count of errors, which is equivalent to
> the flexible definition given.
> 
> By the way, this is in no way a recipe for implementation.  Aggregation
> can be done over all groups in parallel during query execution.
> 
> 
> 
> For the last publication, it was noted
> 
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009OctDec/0646.html
> 
> Unbound and error are the same. The current design so far has it that
> any error means that the multiset is invalid and that group is not
> considered.
> 
> We didn't have time to propose a solid design to address ISSUE-53 - the
> potential design at the time of publication was that any error when
> calculating the ExprMultiset from a partition meant that
> 
> SUM of {1, 2, unbound} is an error.
> COUNT of {1, 2, unbound} is an error.
> 
> I don't think that is a useful form for COUNT(?x).  It does seem to mean
> that COUNT(?x) is either COUNT(*) or error; it can't be anything else.
> 
> COUNT(?x) can not be zero because zero arises when there are no ?x but
> there are solutions in the partition.  If there are no solutions in the
> partition then there is no group key and no grouping happens.
> 
> For each aggregate we can decide what happens about unbounds and errors.
> 
> I would like to see:
> 
> COUNT(*) = size of multiset.
> COUNT(DISTINCT *) = size of set after removing any e (i.e. skip undefs).
> 
> COUNT(?x) = number of times ?x is defined in each group
>      0 <= COUNT(?x) <= COUNT(*)
> 
> COUNT(DISTINCT ?x) = number of times ?x is uniquely defined in each group
> 
> I'm less worried about SUM(?x) but I'd prefer that
> 
>    SUM(?x) = op:numeric-add of defined values of ?x, skips unbounds
> 
> rather that the rigid form we currently have.
> 
> Previously, one of the difficulties raised for this design was that the
> operation to add two numbers wasn't op:numeric-add because that could
> not cope the errors (there were related datatyping issues as well).
> 
> With the definition of ExprMultiSet above, op:numeric-add can be used to
> define SUM.  There is step between getting the ExprMultiSet and the
> calculation of aggregation.  This step, for SUM (and COUNT(?x)), removes
> any errors.
> 
> GROUP_CONCAT(?x) = concatenation
> and now GROUP_CONCAT of an empty set can be defined as "".
> 
> -------------
> Some examples:
> 
> Does anyone want to suggest we design to get different results in any of
> these cases?
> 
> 
> --Data:
> 
> @prefix : <http://example/> .
> 
> :x1 a :T .
> :x1 :p 1 .
> :x1 :p 2 .
> 
> :x2 a :T .
> :x2 :p 9 .
> 
> :x3 a :T .
> :x3 :p 5 .
> :x3 :q "x" .
> 
> :x4 a :T .
> :x4 :q "z".
> 
> 
> --
> 
> 
> -- Query 1:
>    1 PREFIX  :     <http://example/>
>    2
>    3 SELECT  ?x (count(*) AS ?C)
>    4 WHERE
>    5   { ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> :T
>    6     OPTIONAL
>    7       { ?x :p ?v}
>    8   }
>    9 GROUP BY ?x
>   10 ORDER BY str(?x)
> 
> -----------
> | x   | C |
> ===========
> | :x1 | 2 |
> | :x2 | 1 |
> | :x3 | 1 |
> | :x4 | 1 |
> -----------
> 
> -- Query 2:
> 
> Change line 3 to:
>      SELECT  ?x (count(?v) AS ?C)
> 
> -----------
> | x   | C |
> ===========
> | :x1 | 2 |
> | :x2 | 1 |
> | :x3 | 1 |
> | :x4 | 0 |
> -----------
> 
> -- Query 3:
> 
> Change line 3 to:
>      SELECT  ?x (sum(?v) AS ?C)
> 
> -----------
> | x   | C |
> ===========
> | :x1 | 3 |
> | :x2 | 9 |
> | :x3 | 5 |
> | :x4 | 0 |
> -----------
> 
> The :x4 row is zero because there were no valid numbers to add together.
> 
> -- Different query OPTIONAL part - now has ?p
> 
>    1 PREFIX  :     <http://example/>
>    2
>    3 SELECT  ?x (sum(?v) AS ?C)
>    4 WHERE
>    5   { ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> :T
>    6     OPTIONAL
>    7       { ?x ?any ?v}
>    8   }
>    9 GROUP BY ?x
>   10 ORDER BY str(?x)
> 
> -----------
> | x   | C |
> ===========
> | :x1 | 3 |
> | :x2 | 9 |
> | :x3 | 5 |
> | :x4 | 0 |
> -----------
> 
> The case  where ?v is "Z2 and "x" have been skipped.
> 
>         Andy
> 
> 
> 
> 
> 
>
Received on Friday, 26 March 2010 09:57:52 UTC