Re: Evaluation when there are errors in aggregates

On 2011-03-01, at 12:28, Andy Seaborne wrote:
> On 01/03/11 12:02, Steve Harris wrote:
>> On 2011-03-01, at 11:46, Andy Seaborne wrote:
>> 
>>> I tried to make ARQ exactly follow the process in the spec and found that the aggregate tests don't seem to have any error coverage.
>>> 
>>> - - - - - - - - - - - - - - - - - -
>>> 
>>> Steve,
>>> 
>>> I don't understand the new example in rq25:
>>> --------------
>>> PREFIX :<http://example.com/data/#>
>>> SELECT ?g (AVG(?p) AS ?avg) ((MIN(?p) + MAX(?p)) / 2 AS ?c)
>>> WHERE {
>>>  ?g :p ?p .
>>> }
>>> GROUP BY ?g
>>> --------------
>>> Result:
>>> ?g	?avg	?c
>>> <x>	2.5	2.5
>>> <z>	2.5	2.5
>>> --------------
>>> 
>>> Why not
>>> 
>>> Result:
>>> ?g	?avg	?c
>>> <x>	2.5	2.5
>>> <y>              2.5
>>> <z>	2.5	2.5PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>
> PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
> PREFIX owl:     <http://www.w3.org/2002/07/owl#>
> PREFIX fn:      <http://www.w3.org/2005/xpath-functions#>
> PREFIX dc:      <http://purl.org/dc/elements/1.1/>
> PREFIX apf:     <http://jena.hpl.hp.com/ARQ/property#>
> 
> SELECT ?book ?title
> WHERE
>   { ?book dc:title ?title }
>>> representation
>> 
>> Are you suggesting that the result with 3 solutions is correct as per working group decisions, or as per the algebra as it stands?
> 
> Yes to both.
> 
> I believe the WG decision is that AVG is an error.  MIN and MAX are not. AVG is an error because 1+"2" is an error. The error in AVG is handled by the SELECT expression mechanism for errors.

Ah, I see. I think I misremembered how select expressions deal with errors then. I remembered that the whole solution was discarded.

In that case I prefer your proposed text.

> The algebra indirectly assumes the error handling but my suggested clarification spells it out.  If a set is defined as containing expression X and X is not an expression it isn't in the set implicitly.
> 
> (Unrelated: I think the fact the "MIN+MAX/2" is not an error is implementation dependent : the relationship of 1 and "2" is not prescribed only that hey are ordered in some way.

Ah, yes, good point. I will change the example to make it something that is required to be an error.

>>> If AVG(?p) is an error, then the expression in the SELECT line is an error and so binding does not happen.
>>> 
>>> I've worked through the formal definitions and it seems to come down to:
>>> 
>>> eval(D(G), AggregateJoin(A, P) = { (aggi, eval(D(G), Ai)) | Ai in A }
>>> 
>>> and eval(D(G), Ai) being an error.
>>> 
>>> I suggest adding:
>>> 
>>> eval(D(G), AggregateJoin(A, P) =
>>>    { (aggi, eval(D(G), Ai)) | Ai in A , eval(D(G), Ai) not an error }
>>>    # If eval(D(G), Ai) is an error, it is ignored.
>>> 
>>> then the value for AVG is just not defined, and so (AVG(?p) AS ?avg) is handled by the usual mechanism.
>> 
>> I was hoping to make it bubble up to use the same mechanism as Project Expressions, wouldn't think happen?
>> 
>> The next stage after AggregateJoin will be a Project, and doesn't Project discard solutions that contain errors?
> 
> The next step out is any "extend" to do the binding of expression to the named variable.  That's where the error for AVG is discarded to my reading.
> 
> Project does not discard solutions. It can't change the number of rows at all. All project does is reduce the number of variables in the solution mapping.  It does not touch the value - and the value can't be "error" anyway.

I see.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Tuesday, 1 March 2011 12:58:46 UTC