Re: Proposed definition of ExprMultiSet

Thanks Andy,

I anyway see things much clearer now after the f2f discussions!

best,
Axel

On 29 Mar 2010, at 23:07, Andy Seaborne wrote:

> 
> 
> On 26/03/2010 10:18 AM, Axel Polleres wrote:
> >
> > On 26 Mar 2010, at 09:57, Axel Polleres wrote:
> >
> >> short clarification request:
> >>
> >>> { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined }
> >>
> >> by "is defined" you mean "is unequal to 'error'", yes?
> >
> > p.s.:
> >
> >   or do you mean such that μ(expr) is defined ?
> 
> μ is a substitution function (it's mapping from vars to terms).
> 
> μ(expr) is the rewrite of the expression with variables replaced by any
> defined values.
> 
> eval(μ(expr)) is the value of that (and errors for any unbound variables
> - they didn't get substitued and variables aren't RDF terms).
> 
>         Andy
> 
> 
> 
> >
> > I think I get the intention... to be able to treat unbound different from error, yes?
> >
> >
> > here my understanding of the proposal. Say we have:
> >
> >   ?X ?Y
> 
> Does not make sense later on:
> 
> Lets' try :
> 
>  >   ?Z ?X
> >   -----
> >   a  1
> >   b  0
> >   c
> >   d  "bla"
> >   e  1
> >
> > Here my understanding of the proposal:
> >
> >   COUNT( * ) ->  5
> 
> Agreed
> 
> >   COUNT( ?X ) ->  4
> 
> Agreed after ?X moved to right-hand column.
> 
> 
> >   COUNT( DISTINCT ?X ) ->  3
> 
> Agreed.
> 
> >
> > yes? if so, clear so far. now what about expressions?
> >
> >   COUNT( ?X * ?X || ?X * ?X )  ->   ?
> 
> """ SPARQL 1.0:
> Note that logical-or operates on the effective boolean value of its
> arguments.
> """
> 
> So 3 as ?X*?X is only defined for 1,0,1
> 
> 1||1 => true
> 0||0 => false
> 
> >   COUNT( DISTINCT (?X * ?X || ?X * ?X) )  ->   ?
> 
> 2 (true and false of ||)
> 
> >
> > concretely, what happens to the "bla" row that produces an error? what happens to the unbound row, that also producse an error when the expression is evaluated?
> 
> Same - eval(?x) is undefined as is eval("bla"*"bla")
> 
> >
> > Thanks,
> > Axel
> 
>         Andy
> 
> >
> >>
> >> What I mean to ask here... when I read the current section on Filter evaluation
> >> http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#evaluation
> >> to me it seems that eval is always "defined", but it could be an error for reasons of mistyping or
> >> values being unbound.
> >>
> >> Thanks for clarification on whether/what I might have overlooked here!
> >>
> >> Axel
> >>
> >> On 7 Mar 2010, at 17:42, Andy Seaborne wrote:
> >>
> >>> ISSUE-53
> >>>
> >>> I propose the following to define ExprMultiSet:
> >>>
> >>> -------
> >>>
> >>> Let Ω be a partition.
> >>>
> >>> ExprMultiSet(Ω) =
> >>>    { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined }
> >>>    UNION
> >>>    { e | μ in Ω such that  eval(μ(expr)) is undefined }
> >>>
> >>> where "e" is some symbol that is distinct from all RDF terms.
> >>>
> >>> card[x]:
> >>>    if DISTINCT:
> >>>       card[x] = 1 if there exists μ in Ω such that x =  eval(μ(expr))
> >>>       card[x] = 0 otherwise
> >>>    else
> >>>       card[x] = count of μ in Ω such that x =  eval(μ(expr))
> >>>
> >>> --------
> >>>
> >>> "e" just records error evaluations.
> >>>
> >>> This is the most flexible definition. An alternative is
> >>>
> >>> ExprMultiset(Ω) =
> >>>    { eval(expr,μ) | μ in Ω such that eval(expr,μ) is defined }
> >>>
> >>> which is hard-coding dropping errors and unbounds during evaluation. But
> >>> the aggregate can't know there were some errors.
> >>>
> >>> Another possibility is that a yes/no flag indicating a error was seen.
> >>> But this might as well be the count of errors, which is equivalent to
> >>> the flexible definition given.
> >>>
> >>> By the way, this is in no way a recipe for implementation.  Aggregation
> >>> can be done over all groups in parallel during query execution.
> >>>
> >>>
> >>>
> >>> For the last publication, it was noted
> >>>
> >>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009OctDec/0646.html
> >>>
> >>> Unbound and error are the same. The current design so far has it that
> >>> any error means that the multiset is invalid and that group is not
> >>> considered.
> >>>
> >>> We didn't have time to propose a solid design to address ISSUE-53 - the
> >>> potential design at the time of publication was that any error when
> >>> calculating the ExprMultiset from a partition meant that
> >>>
> >>> SUM of {1, 2, unbound} is an error.
> >>> COUNT of {1, 2, unbound} is an error.
> >>>
> >>> I don't think that is a useful form for COUNT(?x).  It does seem to mean
> >>> that COUNT(?x) is either COUNT(*) or error; it can't be anything else.
> >>>
> >>> COUNT(?x) can not be zero because zero arises when there are no ?x but
> >>> there are solutions in the partition.  If there are no solutions in the
> >>> partition then there is no group key and no grouping happens.
> >>>
> >>> For each aggregate we can decide what happens about unbounds and errors.
> >>>
> >>> I would like to see:
> >>>
> >>> COUNT(*) = size of multiset.
> >>> COUNT(DISTINCT *) = size of set after removing any e (i.e. skip undefs).
> >>>
> >>> COUNT(?x) = number of times ?x is defined in each group
> >>>      0<= COUNT(?x)<= COUNT(*)
> >>>
> >>> COUNT(DISTINCT ?x) = number of times ?x is uniquely defined in each group
> >>>
> >>> I'm less worried about SUM(?x) but I'd prefer that
> >>>
> >>>    SUM(?x) = op:numeric-add of defined values of ?x, skips unbounds
> >>>
> >>> rather that the rigid form we currently have.
> >>>
> >>> Previously, one of the difficulties raised for this design was that the
> >>> operation to add two numbers wasn't op:numeric-add because that could
> >>> not cope the errors (there were related datatyping issues as well).
> >>>
> >>> With the definition of ExprMultiSet above, op:numeric-add can be used to
> >>> define SUM.  There is step between getting the ExprMultiSet and the
> >>> calculation of aggregation.  This step, for SUM (and COUNT(?x)), removes
> >>> any errors.
> >>>
> >>> GROUP_CONCAT(?x) = concatenation
> >>> and now GROUP_CONCAT of an empty set can be defined as "".
> >>>
> >>> -------------
> >>> Some examples:
> >>>
> >>> Does anyone want to suggest we design to get different results in any of
> >>> these cases?
> >>>
> >>>
> >>> --Data:
> >>>
> >>> @prefix :<http://example/>  .
> >>>
> >>> :x1 a :T .
> >>> :x1 :p 1 .
> >>> :x1 :p 2 .
> >>>
> >>> :x2 a :T .
> >>> :x2 :p 9 .
> >>>
> >>> :x3 a :T .
> >>> :x3 :p 5 .
> >>> :x3 :q "x" .
> >>>
> >>> :x4 a :T .
> >>> :x4 :q "z".
> >>>
> >>>
> >>> --
> >>>
> >>>
> >>> -- Query 1:
> >>>    1 PREFIX  :<http://example/>
> >>>    2
> >>>    3 SELECT  ?x (count(*) AS ?C)
> >>>    4 WHERE
> >>>    5   { ?x<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  :T
> >>>    6     OPTIONAL
> >>>    7       { ?x :p ?v}
> >>>    8   }
> >>>    9 GROUP BY ?x
> >>>   10 ORDER BY str(?x)
> >>>
> >>> -----------
> >>> | x   | C |
> >>> ===========
> >>> | :x1 | 2 |
> >>> | :x2 | 1 |
> >>> | :x3 | 1 |
> >>> | :x4 | 1 |
> >>> -----------
> >>>
> >>> -- Query 2:
> >>>
> >>> Change line 3 to:
> >>>      SELECT  ?x (count(?v) AS ?C)
> >>>
> >>> -----------
> >>> | x   | C |
> >>> ===========
> >>> | :x1 | 2 |
> >>> | :x2 | 1 |
> >>> | :x3 | 1 |
> >>> | :x4 | 0 |
> >>> -----------
> >>>
> >>> -- Query 3:
> >>>
> >>> Change line 3 to:
> >>>      SELECT  ?x (sum(?v) AS ?C)
> >>>
> >>> -----------
> >>> | x   | C |
> >>> ===========
> >>> | :x1 | 3 |
> >>> | :x2 | 9 |
> >>> | :x3 | 5 |
> >>> | :x4 | 0 |
> >>> -----------
> >>>
> >>> The :x4 row is zero because there were no valid numbers to add together.
> >>>
> >>> -- Different query OPTIONAL part - now has ?p
> >>>
> >>>    1 PREFIX  :<http://example/>
> >>>    2
> >>>    3 SELECT  ?x (sum(?v) AS ?C)
> >>>    4 WHERE
> >>>    5   { ?x<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  :T
> >>>    6     OPTIONAL
> >>>    7       { ?x ?any ?v}
> >>>    8   }
> >>>    9 GROUP BY ?x
> >>>   10 ORDER BY str(?x)
> >>>
> >>> -----------
> >>> | x   | C |
> >>> ===========
> >>> | :x1 | 3 |
> >>> | :x2 | 9 |
> >>> | :x3 | 5 |
> >>> | :x4 | 0 |
> >>> -----------
> >>>
> >>> The case  where ?v is "Z2 and "x" have been skipped.
> >>>
> >>>         Andy
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> 

Received on Monday, 29 March 2010 22:34:36 UTC