ISSUE-12 (HAVING vs FILTER) from Andy Seaborne on 2010-02-09 (public-rdf-dawg@w3.org from January to March 2010)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Tue, 09 Feb 2010 13:08:30 +0000
To: Axel Polleres <axel.polleres@deri.org>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4B715E4E.6000506@talis.com>

Ok - still something to debate ...

> ISSUE-12: hmmm...
> IIRC, the last discussion we had about that ended in that
>
> {SELECT S
> WHERE W
> GROUP BY G
> HAVING F}
>
> ist just the same as
>
> {SELECT S
> WHERE W
> GROUP BY G
> } FILTER F

That's slightly different. HAVING happens before the projection so can 
"see" more variables.

SELECT ?x
{ ?x :p ?o }
GROUP BY ?x
HAVING (count(?o) > 3)

does not expose the count.  This can make a difference:

  {
  SELECT S
  WHERE W
  GROUP BY G
  LIMIT 10
  } FILTER F

puts the LIMIT before the expression restriction.

You could do it with 2 sub-SELECTs, the middle one projecting from the 
working set of variables to the exposed set - not serious!


You can just use the word FILTER where there is HAVING.

SELECT S
WHERE W
GROUP BY G
FILTER F


If we disallow aggregates in FILTERs by some means (e.g. two expression 
hierarchies, or text prohibiting them), than HAVING vs FILTER matters. 
I think we should.

Saying "not in HAVING" is easier than "not in FILTER except where FILTER 
appears as a solution modifier"


Otherwise you can write:

SELECT ?x
{ ?x :p ?o
   FILTER(count(?o) > 3)
}

which has a sensible reading of an implicit group and we choose not to 
have implicit grouping via SELECT.  (It may have other readings as well.)

> That's why I am still hesitant to use HAVING instead of just FILTER as
> a keyword. At least, I still am not convinced that introducing another
> keyword is useful here. I think the dispute of the issue is still just
> the used keyword itself, isn't it? Do we have a resolution confirming
> to use HAVING? I remember this being the last state of discussion,
> where no real agreement was reached:
> http://www.w3.org/2009/sparql/meeting/2009-11-02#line0255
> Let me know if I miss some later findings on that from the list or from a later call.
>
> best,
> Axel

Does any implementation call it FILTER?



ARQ uses HAVING and bans aggregates in FILTER.  This is done during 
parsing but by context (so it's not in the grammar - I didn't write out 
two different sets of expression rules as there are quite a lot of them 
to get down the PrimitiveExpression.

void Project() : {  Var v ; Expr expr ; Node n ; }
{
   <SELECT>
   ...
   { allowAggregatesInExpressions = true ; }
   ... select expressions ...
   { allowAggregatesInExpressions = false ; }
}

void HavingClause() : { }
{
     { allowAggregatesInExpressions = true ; }
     <HAVING> (HavingCondition())+
     { allowAggregatesInExpressions = false ; }
}

so the last can be FILTER but it's easier to explain if it's a different 
word.

Because SQL calls it HAVING, I think we do at least need a positive 
reason to do something different because it does introduce a small 
hurdle for application writers - so what's the value to them of the 
alternative?

I was pretty neutral, on this but

+ prohibition of aggregates in normal FILTERs
+ the SQL analogy

means that I'd like to see some value to application writers in calling 
it HAVING and at the moment I don't.

	Andy

Received on Tuesday, 9 February 2010 13:09:03 UTC