Aggregation over unbound variables

We are developing a SPARQL 1.1 implementation, and and are hoping for some guidance on the SPARQL 1.1. specification on how to deal with aggregation over unbound variables.

I believe COUNT functions are the same in SQL and SPARQL.  but the other aggregates (sum/min/max/avg), seem to have different semantics (at least per Jena).

The semantics of sum/min/max/avg are different w.r.t nulls (at least according ot Jena).

In SQL, the sum/min/max/avg of a nullable column is the sum/min/max/avg of the non-null values.  For example, Suppose you have the following data in table "foo":

  name    | age 
----------+------
  "Bob"   | 5   
  "Bob"   |     
  "Alice" | 3   
  "Alice" | 4   

Then the query, "select sum(age) from foo group by name" gives this result:

  name    | sum 
----------+-------
  "Bob"   | 5   
  "Alice" | 7   


In contrast, Jena returns NULL (i.e. unbound) if there are any nulls in the data:

-----------------------------------------------------
| name    | total | cnt | cntstar | avg | min | max |
=====================================================
| "Bob"   |       | 1   | 2       |     |     |     |
| "Alice" | 7     | 2   | 2       | 3.5 | 3   | 4   |
-----------------------------------------------------


Data:

@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
_:a  foaf:name       "Alice" .
_:a  foaf:age        4 .
_:b  foaf:name       "Alice" .
_:b  foaf:age        3 .
_:c  foaf:name       "Bob" .
_:c  foaf:age        5 .
_:d  foaf:name       "Bob" .

Query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
       (sum(?age) as ?total)
       (count(?age) as ?cnt)
       (count(*) as ?cntstar)
       (avg(?age) as ?avg)
       (min(?age) as ?min)
       (max(?age) as ?max)
WHERE  {
  ?x foaf:name  ?name .
  OPTIONAL { ?x  foaf:age  ?age }
}
group by ?name

We have found that at least two SPARQL implementations use the SQL semantics for this, so it would be of benefit to the SPARQL community to have consistent way to handle aggregation over unbound variables.

Best regards

Arthur Keen

Received on Friday, 14 February 2014 08:07:38 UTC