W3C home > Mailing lists > Public > www-ql@w3.org > April to June 2004

RE: Joins with groupby (No example in W3 Draft)

From: Michael Kay <mhk@mhk.me.uk>
Date: Wed, 23 Jun 2004 04:19:58 +0100
To: "'Houman Khorasani'" <khorasani@web.de>, "'Priscilla Walmsley'" <priscilla@walmsley.com>, <www-ql@w3.org>
Message-Id: <20040623033906.A209EA0711@frink.w3.org>

Obviously performance depends on the optimizations done by your particular
query engine, but here are some suggestions to place less reliance on the
optimizer:

(a) use specific paths rather than $x//xyz. The "//" operator is always
expensive as it involves searching a whole subtree.

(b) the expression $temp[ip=$distinctIp] appears three times: assign the
result to a variable.

(c) the expression "for $i in $temp[ip=$distinctIp] return $i//rID" can
almost certainly be rewritten as "$temp[ip=$distinctIp]//rID"

Michael Kay

 

> -----Original Message-----
> From: www-ql-request@w3.org [mailto:www-ql-request@w3.org] On 
> Behalf Of Houman Khorasani
> Sent: 22 June 2004 20:40
> To: 'Priscilla Walmsley'; www-ql@w3.org
> Subject: RE: Joins with groupby (No example in W3 Draft)
> Importance: High
> 
> 
> Hi Priscilla,
> 
> Thank you very much, for your help.  Actually it didn't work 
> but I didn't
> give up and after keep trying over many days I could solve it 
> at the end. :)
> Maybe there is still a way to improve the performance of this 
> query. If you
> see something so please tell me to change and try that.
> 
> There is just one solution for 'Group By' what does exactly 
> the same what
> SQL 'Group By' does:
> 
> ----------------------------------------
> 
> let $temp :=	for $wbtt in 
> doc("wisc_berkeley_tracetime.xml")//row[dat =
> '20000801000000']
> 		for $wbtl in 
> doc("wisc_berkeley_tracelist.xml")//row[rID =
> $wbtt/rID]
> 		return <g>{($wbtl/rID, $wbtl/ip, $wbtt/min, 
> $wbtt/dat)}</g>
> 
> for $distinctIp in distinct-values($temp//ip)
> 
> let $rID :=	for $i in $temp[ip=$distinctIp] 
> 		return $i//rID
> 
> let $dat := 	for $i in $temp[ip=$distinctIp] 
> 		return $i//dat
> 
> let $min := 	for $i in $temp[ip=$distinctIp] 
> 		return $i//min
> 
> order by $distinctIp
> 
> return
> 	<ROW> 
> 		{ $rID[1] }
> 		<ip>{ $distinctIp }</ip>
> 		{ $dat[1] }
> 		{ $min[1] }
> 	</ROW>
> 
> --------------------------------------
> This is equivalent to:
> 
> SELECT
>   	wisc_berkeley_tracetime.rID,
>  	ip,
>  	dat,
>  	min
>  FROM
>  	wisc_berkeley_tracelist INNER JOIN wisc_berkeley_tracetime
>   	ON wisc_berkeley_tracelist.rID = 
> wisc_berkeley_tracetime.rID WHERE
>  	(wisc_berkeley_tracetime.dat = '20000801000000')
>  	GROUP by ip
>  	ORDER BY ip;
> 
> 
> This pattern can be used in any case.  I have tested 4 
> Queries some of them
> with 1 others even with 2 inner-joins, grouping by ip or min. 
>  All of them
> work fine.
> 
> That was a hard job to figure that out.
> If you still see something what can be improved regarding the 
> performance, I
> appreciate to know it.
> 
> Best Regards
> Houman M. Khorasani
> University of Wisconsin Platteville
> 
> 
> 
Received on Tuesday, 22 June 2004 23:39:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 22 July 2006 00:10:19 GMT