Re: Feedback on SPARQL 1.1 support for aggregates (was Re: W3C Seeks Feedback on Early Draft of SPARQL 1.1) from Axel Polleres on 2010-02-17 (public-rdf-dawg-comments@w3.org from February 2010)

From: Axel Polleres <axel.polleres@deri.org>
Date: Wed, 17 Feb 2010 18:33:05 +0000
To: Emanuele Della Valle <EMANUELE.DELLAVALLE@POLIMI.IT>
Cc: <public-rdf-dawg-comments@w3.org>, <dbarbieri@elet.polimi.it>, "Stefano Ceri" <ceri@elet.polimi.it>, "Michael Grossniklaus" <grossniklaus@elet.polimi.it>, "Daniele Maria Braga" <braga@elet.polimi.it>, "Frank van Harmelen" <Frank.van.Harmelen@cs.vu.nl>
Message-Id: <817E053F-1116-40E7-BFDF-05FDA9FD3F8D@deri.org>
further comments...

> Yes you're right. The correct formulation of the constraint is:
> 
> In the C-SPARQL language all the variables used in the aggregation
> function or in the grouping set of AGGREGATE clauses must appear also in
> the SELECT clause, since aggregation happens after standard SPARQL query
> evaluation.

If I am not mistaken, doesn't work either... counterexample in your quoted example:

> AGGREGATE { (?affiliationBooks, SUM(?numberOfBooks), {?organization} )


?organisation is not in the SELECT clause.

[...]
> 
> Well, we based the SPARQL 1.1 version on our understanding of SPARQL 1.1.
> 
> In our understanding, the triple pattern < ?auth :hasName ?name . > is
> needed to join the results of the two sub-queries. If we do not include
> it the result would be a Cartesian product of the two sub-queries (see
> APPENDIX for more details).

your query should answer... 
"For instance, one can ask for the research topics for which
the Italian authors are more than the Swiss ones."

but you project the *author* which is neither aggregated nor grouped... 
that doesn't make sense to me... shouldn't it be simply

1. SELECT ?topic ?numberOfSwissAuthors ?numberOfItalianAuthors
2. WHERE {
4.     {
5.             SELECT ?topic (COUNT(?book) AS ?numberOfSwissAuthors)
6.             WHERE {
7.                     ?auth :wrote ?book .
8.                     ?book :topic ?topic .
9.                     ?auth :hasNationality ?nat .
10.                     FILTER(?nat = 'CH') .
11.             }
12.             GROUP BY ?topic
13.     }
14.     {
15.             SELECT ?topic (COUNT(?book) AS ?numberOfItalianAuthors)
16.             WHERE {
17.                     ?auth :wrote ?book .
18.                     ?book :topic ?topic .
19.                     ?auth :hasNationality ?nat .
20.                     FILTER(?nat = 'IT') .
21.             }
22.             GROUP BY ?topic
23.     }
24.     FILTER(?numberOfItalianAuthors>?numberOfSwissAuthors)    
25. }

(didn't try that out just now, just a quick shot, but looks more sensible to me...)

Axel

On 17 Feb 2010, at 15:41, Emanuele Della Valle wrote:

> Dear Axel,
> 
> please find below our clarifications
> 
> Axel Polleres ha scritto:
> > 1)
> >
> > If I get it right, AGGREGATE {var  FUNCTION vars )
> >
> > i) projects first groups wrt variables appearing in vars and then 
> > ii)  evaluates the aggregate on the those groups ...
> >
> > That may make sense for count, but how does that work for
> > min/max, i.e. where is the projection ?
> >
> >    hmmm... actually it seems the grammar given for FUNCTION is wrong...
> >    it should be
> >    
> >     Function | Function '(' var ')'
> >   or
> >     Function | Function '(' vars ')'
> >  
> 
> You're right. The grammar contained a small error. The following grammar
> is the correct one.
> 
>  AggregateClause --> ( "AGGREGATE {(" var "," Function "," Group ")"
> [Filter] "}" )*
>  Function --> "COUNT" | "SUM (" var ")" | "AVG (" var ")" | "MIN (" var
> ")" | "MAX (" var ")"
>  Group --> var | "{" var ( "," var )* "}"
> 
> We also updated http://wiki.larkc.eu/c-sparql/sparql11-feedback accordingly.
> 
> > 2) On the following examples...
> >
> >
> > SELECT ?name ?surname ?book ?numberOfBooks ?averageNumberOfBooks
> > WHERE {
> >     ?auth :name ?name .
> >     ?auth :surname ?surname .
> >     ?auth :wrote ?book .
> >     ?auth :affiliated ?organization .
> > }
> > AGGREGATE { (?numberOfBooks, COUNT, {?auth} ) FILTER (?numberOfBooks > 5) }
> > AGGREGATE { (?affiliationBooks, SUM(?numberOfBooks), {?organization} )
> > FILTER (?affiliationBooks > 50)}
> >
> > here, it seems that ?affiliationBooks violates the constraint you pose before:
> >
> > "In the C-SPARQL language all the variables used in AGGREGATE clauses
> > must appear also in the SELECT clause, since aggregation happens after
> > standard SPARQL query evaluation. In SPARQL 1.1 the constraint is not
> > specified."
> >
> > I assume you mean the constraint only to apply to  the variables mentioned in the function and
> > group part?
> >  
> 
> Yes you're right. The correct formulation of the constrain is:
> 
> In the C-SPARQL language all the variables used in the aggregation
> function or in the grouping set of AGGREGATE clauses must appear also in
> the SELECT clause, since aggregation happens after standard SPARQL query
> evaluation.
> 
> We updated http://wiki.larkc.eu/c-sparql/sparql11-feedback accordingly.
> 
> > Also, I'd like to see the result table for this.
> >  
> 
> We did not include enough data in our example to show the result. I can
> generate an example, but I rather prefer you to further elaborate the
> request for clarification.
> 
> > 3) also, it seems to me that your SPARQL1.1 formulation of this one
> >
> > SELECT ?topic ?numberOfSwissAuthors ?numberOfItalianAuthors
> > WHERE {
> >     ?auth :name ?name .
> >     ?auth :wrote ?book .
> >     ?book :topic ?topic .
> >     ?auth :hasNationality ?nat .
> > }
> > AGGREGATE { FILTER(?nat = 'IT') (?numberOfItalianAuthors, COUNT,
> > {?topic} ) }
> > AGGREGATE { FILTER(?nat = 'CH') (?numberOfSwissAuthors, COUNT, {?topic}
> > ) FILTER(?numberOfItalianAuthors>?numberOfSwissAuthors)}
> >
> > has some errors... e.g. why is the triple outside the subselects
> >
> >  ?auth :hasName ?name .
> >
> > and you proejct auth?
> > Wouldn't you want to project the ?topic instead?
> >  
> 
> Well, we based the SPARQL 1.1 version on our understanding of SPARQL 1.1.
> 
> In our understanding, the triple pattern < ?auth :hasName ?name . > is
> needed to join the results of the two sub-queries. If we do not include
> it the result would be a Cartesian product of the two sub-queries (see
> APPENDIX for more details).
> 
> Best regards,
> 
> Emanuele
> 
> APPENDIX
> 
> We formulate answer 3 based onVirtuoso implementation of sub-queries.
> Please find below the queries we used to understand SPARQL 1.1 sub-query
> behaviour.
> 
> SELECT ?name  WHERE {
>   {
>     SELECT DISTINCT ?name WHERE {
>       <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name
>     }
>   }
> }
> 
> This query gives 6 results.
> 
> SELECT ?name ?name1 WHERE {
>   {
>     SELECT DISTINCT ?name WHERE {
>       <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name
>     }
>   }
>   {
>     SELECT DISTINCT ?name1 WHERE {
>       <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name1
>     }
>   }
> }
> 
> This gives 36 results (the Cartesian product)
> 
> SELECT ?name  ?name1 WHERE {
>   {
>     SELECT DISTINCT ?name WHERE {
>       <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name
>     }
>   }
> UNION
>   {
>     SELECT DISTINCT ?name1 WHERE {
>       <http://www.w3.org/People/Berners-Lee/card#i> foaf:name ?name1
>     }
>   }
> }
> 
> This gives 12 results (6 + 6).
> 
> 
>
Received on Wednesday, 17 February 2010 18:33:42 UTC