RE: Joins with groupby (No example in W3 Draft)

Obviously performance depends on the optimizations done by your particular
query engine, but here are some suggestions to place less reliance on the
optimizer:

(a) use specific paths rather than $x//xyz. The "//" operator is always
expensive as it involves searching a whole subtree.

(b) the expression $temp[ip=$distinctIp] appears three times: assign the
result to a variable.

(c) the expression "for $i in $temp[ip=$distinctIp] return $i//rID" can
almost certainly be rewritten as "$temp[ip=$distinctIp]//rID"

Michael Kay

 

> -----Original Message-----
> From: www-ql-request@w3.org [mailto:www-ql-request@w3.org] On 
> Behalf Of Houman Khorasani
> Sent: 22 June 2004 20:40
> To: 'Priscilla Walmsley'; www-ql@w3.org
> Subject: RE: Joins with groupby (No example in W3 Draft)
> Importance: High
> 
> 
> Hi Priscilla,
> 
> Thank you very much, for your help.  Actually it didn't work 
> but I didn't
> give up and after keep trying over many days I could solve it 
> at the end. :)
> Maybe there is still a way to improve the performance of this 
> query. If you
> see something so please tell me to change and try that.
> 
> There is just one solution for 'Group By' what does exactly 
> the same what
> SQL 'Group By' does:
> 
> ----------------------------------------
> 
> let $temp :=	for $wbtt in 
> doc("wisc_berkeley_tracetime.xml")//row[dat =
> '20000801000000']
> 		for $wbtl in 
> doc("wisc_berkeley_tracelist.xml")//row[rID =
> $wbtt/rID]
> 		return <g>{($wbtl/rID, $wbtl/ip, $wbtt/min, 
> $wbtt/dat)}</g>
> 
> for $distinctIp in distinct-values($temp//ip)
> 
> let $rID :=	for $i in $temp[ip=$distinctIp] 
> 		return $i//rID
> 
> let $dat := 	for $i in $temp[ip=$distinctIp] 
> 		return $i//dat
> 
> let $min := 	for $i in $temp[ip=$distinctIp] 
> 		return $i//min
> 
> order by $distinctIp
> 
> return
> 	<ROW> 
> 		{ $rID[1] }
> 		<ip>{ $distinctIp }</ip>
> 		{ $dat[1] }
> 		{ $min[1] }
> 	</ROW>
> 
> --------------------------------------
> This is equivalent to:
> 
> SELECT
>   	wisc_berkeley_tracetime.rID,
>  	ip,
>  	dat,
>  	min
>  FROM
>  	wisc_berkeley_tracelist INNER JOIN wisc_berkeley_tracetime
>   	ON wisc_berkeley_tracelist.rID = 
> wisc_berkeley_tracetime.rID WHERE
>  	(wisc_berkeley_tracetime.dat = '20000801000000')
>  	GROUP by ip
>  	ORDER BY ip;
> 
> 
> This pattern can be used in any case.  I have tested 4 
> Queries some of them
> with 1 others even with 2 inner-joins, grouping by ip or min. 
>  All of them
> work fine.
> 
> That was a hard job to figure that out.
> If you still see something what can be improved regarding the 
> performance, I
> appreciate to know it.
> 
> Best Regards
> Houman M. Khorasani
> University of Wisconsin Platteville
> 
> 
> 

Received on Tuesday, 22 June 2004 23:39:40 UTC