Re: Feedback on SPARQL 1.1 support for aggregates (was Re: W3C Seeks Feedback on Early Draft of SPARQL 1.1) from Emanuele Della Valle on 2010-02-17 (public-rdf-dawg-comments@w3.org from February 2010)

From: Emanuele Della Valle <emanuele.dellavalle@polimi.it>
Date: Wed, 17 Feb 2010 16:41:42 +0100
To: Axel Polleres <axel.polleres@deri.org>
CC: public-rdf-dawg-comments@w3.org, "dbarbieri@elet.polimi.it" <dbarbieri@elet.polimi.it>, Stefano Ceri <ceri@elet.polimi.it>, Michael Grossniklaus <grossniklaus@elet.polimi.it>, Daniele Maria Braga <braga@elet.polimi.it>, Frank van Harmelen <Frank.van.Harmelen@cs.vu.nl>
Message-ID: <4B7C0E36.5050909@polimi.it>

Dear Axel,

please find below our clarifications

Axel Polleres ha scritto:
> 1)
>
> If I get it right, AGGREGATE {var  FUNCTION vars ) 
>
> i) projects first groups wrt variables appearing in vars and then  
> ii)  evaluates the aggregate on the those groups ...
>
> That may make sense for count, but how does that work for 
> min/max, i.e. where is the projection ? 
>
>    hmmm... actually it seems the grammar given for FUNCTION is wrong... 
>    it should be 
>     
>     Function | Function '(' var ')'
>   or 
>     Function | Function '(' vars ')'
>   

You're right. The grammar contained a small error. The following grammar 
is the correct one.

 AggregateClause --> ( "AGGREGATE {(" var "," Function "," Group ")" 
[Filter] "}" )*
 Function --> "COUNT" | "SUM (" var ")" | "AVG (" var ")" | "MIN (" var 
")" | "MAX (" var ")"
 Group --> var | "{" var ( "," var )* "}"

We also updated http://wiki.larkc.eu/c-sparql/sparql11-feedback accordingly.

> 2) On the following examples...
>
>
> SELECT ?name ?surname ?book ?numberOfBooks ?averageNumberOfBooks
> WHERE {
>     ?auth :name ?name .
>     ?auth :surname ?surname .
>     ?auth :wrote ?book .
>     ?auth :affiliated ?organization .
> }
> AGGREGATE { (?numberOfBooks, COUNT, {?auth} ) FILTER (?numberOfBooks > 5) }
> AGGREGATE { (?affiliationBooks, SUM(?numberOfBooks), {?organization} )
> FILTER (?affiliationBooks > 50)}
>
> here, it seems that ?affiliationBooks violates the constraint you pose before:
>
> "In the C-SPARQL language all the variables used in AGGREGATE clauses
> must appear also in the SELECT clause, since aggregation happens after
> standard SPARQL query evaluation. In SPARQL 1.1 the constraint is not
> specified."
>
> I assume you mean the constraint only to apply to  the variables mentioned in the function and
> group part?
>   

Yes you're right. The correct formulation of the constrain is:

In the C-SPARQL language all the variables used in the aggregation 
function or in the grouping set of AGGREGATE clauses must appear also in 
the SELECT clause, since aggregation happens after standard SPARQL query 
evaluation.

We updated http://wiki.larkc.eu/c-sparql/sparql11-feedback accordingly.

> Also, I'd like to see the result table for this.
>   

We did not include enough data in our example to show the result. I can 
generate an example, but I rather prefer you to further elaborate the 
request for clarification.

> 3) also, it seems to me that your SPARQL1.1 formulation of this one
>
> SELECT ?topic ?numberOfSwissAuthors ?numberOfItalianAuthors
> WHERE {
>     ?auth :name ?name .
>     ?auth :wrote ?book .
>     ?book :topic ?topic .
>     ?auth :hasNationality ?nat .
> }
> AGGREGATE { FILTER(?nat = 'IT') (?numberOfItalianAuthors, COUNT,
> {?topic} ) }
> AGGREGATE { FILTER(?nat = 'CH') (?numberOfSwissAuthors, COUNT, {?topic}
> ) FILTER(?numberOfItalianAuthors>?numberOfSwissAuthors)}
>
> has some errors... e.g. why is the triple outside the subselects 
>
>  ?auth :hasName ?name .
>
> and you proejct auth? 
> Wouldn't you want to project the ?topic instead?
>   

Well, we based the SPARQL 1.1 version on our understanding of SPARQL 1.1.

In our understanding, the triple pattern < ?auth :hasName ?name . > is 
needed to join the results of the two sub-queries. If we do not include 
it the result would be a Cartesian product of the two sub-queries (see 
APPENDIX for more details).

Best regards,

Emanuele

APPENDIX

We formulate answer 3 based onVirtuoso implementation of sub-queries. 
Please find below the queries we used to understand SPARQL 1.1 sub-query 
behaviour.

SELECT ?name  WHERE {
  {
    SELECT DISTINCT ?name WHERE {
      <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name
    }
  }
}

This query gives 6 results.

SELECT ?name ?name1 WHERE {
  {
    SELECT DISTINCT ?name WHERE {
      <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name
    }
  }
  {
    SELECT DISTINCT ?name1 WHERE {
      <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name1
    }
  }
}

This gives 36 results (the Cartesian product)

SELECT ?name  ?name1 WHERE {
  {
    SELECT DISTINCT ?name WHERE {
      <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name
    }
  }
UNION
  {
    SELECT DISTINCT ?name1 WHERE {
      <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name1
    }
  }
}

This gives 12 results (6 + 6).

Received on Wednesday, 17 February 2010 15:42:20 UTC