RE: Group by from Kay, Michael on 2002-10-31 (public-qt-comments@w3.org from October 2002)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Thu, 31 Oct 2002 17:21:28 +0100
To: Eddie McGreal <emcgreal@BlackPearl.com>, "'public-qt-comments@w3.org'" <public-qt-comments@w3.org>
Message-ID: <DFF2AC9E3583D511A21F0008C7E621060453DD27@daemsg02.software-ag.de>

> Although the spec includes a "Group by" example we have found 
> that such an xquery performs very poorly. The reason for this 
> is that the number of comparisons explode as it behaves 
> quadratically. 
> Since there is a sort by expression why not a group by as 
> well - it would also make mapping to SQL a little easier. At 
> the moment we have implemented group by as a function - with 
> enormous performance gains


Adding to Jonathan's reply, I think the main difficulty in defining a "group
by" construct is that the data model does not allow sets of sets (or
sequences of sequences). The natural result of a grouping function is to
create a set of groups, and this result is difficult to model currently.

In XSLT 2.0 we got round this by not exposing the set of groups as an
object, but providing a construct that iterates over this set, one group at
a time, essentially (in XQuery-like syntax)

for group $d in //employee group-by department return
   {-- $d is now a sequence representing the contents of one group --}
   <dept name="{$d[1]/department}">
     {for $e in $d return
       <empl>
         {$e/name}
       </empl>
     }
   </dept>

Of course many people asking for XQuery group-by are looking for something
like SQL group-by. This doesn't really carry over very well to XML, because
the SQL construct is constrained to return tuples, which don't make much
sense in XML.

Given all the experience of the grouping problems people have with XSLT 1.0,
I think that the distinct-values() function will probably meet 90% of the
requirement, and with a little bit of cleverness I think it can be
implemented with (n log n) performance (you need to create a dynamic index
over the grouping population and then make sure you use it when needed). 

So I do think this can be left to XQuery 2.0.

Michael Kay

Received on Thursday, 31 October 2002 11:21:40 UTC