- From: Michael Kay <mhk@mhk.me.uk>
- Date: Wed, 23 Jun 2004 04:19:58 +0100
- To: "'Houman Khorasani'" <khorasani@web.de>, "'Priscilla Walmsley'" <priscilla@walmsley.com>, <www-ql@w3.org>
Obviously performance depends on the optimizations done by your particular
query engine, but here are some suggestions to place less reliance on the
optimizer:
(a) use specific paths rather than $x//xyz. The "//" operator is always
expensive as it involves searching a whole subtree.
(b) the expression $temp[ip=$distinctIp] appears three times: assign the
result to a variable.
(c) the expression "for $i in $temp[ip=$distinctIp] return $i//rID" can
almost certainly be rewritten as "$temp[ip=$distinctIp]//rID"
Michael Kay
> -----Original Message-----
> From: www-ql-request@w3.org [mailto:www-ql-request@w3.org] On
> Behalf Of Houman Khorasani
> Sent: 22 June 2004 20:40
> To: 'Priscilla Walmsley'; www-ql@w3.org
> Subject: RE: Joins with groupby (No example in W3 Draft)
> Importance: High
>
>
> Hi Priscilla,
>
> Thank you very much, for your help. Actually it didn't work
> but I didn't
> give up and after keep trying over many days I could solve it
> at the end. :)
> Maybe there is still a way to improve the performance of this
> query. If you
> see something so please tell me to change and try that.
>
> There is just one solution for 'Group By' what does exactly
> the same what
> SQL 'Group By' does:
>
> ----------------------------------------
>
> let $temp := for $wbtt in
> doc("wisc_berkeley_tracetime.xml")//row[dat =
> '20000801000000']
> for $wbtl in
> doc("wisc_berkeley_tracelist.xml")//row[rID =
> $wbtt/rID]
> return <g>{($wbtl/rID, $wbtl/ip, $wbtt/min,
> $wbtt/dat)}</g>
>
> for $distinctIp in distinct-values($temp//ip)
>
> let $rID := for $i in $temp[ip=$distinctIp]
> return $i//rID
>
> let $dat := for $i in $temp[ip=$distinctIp]
> return $i//dat
>
> let $min := for $i in $temp[ip=$distinctIp]
> return $i//min
>
> order by $distinctIp
>
> return
> <ROW>
> { $rID[1] }
> <ip>{ $distinctIp }</ip>
> { $dat[1] }
> { $min[1] }
> </ROW>
>
> --------------------------------------
> This is equivalent to:
>
> SELECT
> wisc_berkeley_tracetime.rID,
> ip,
> dat,
> min
> FROM
> wisc_berkeley_tracelist INNER JOIN wisc_berkeley_tracetime
> ON wisc_berkeley_tracelist.rID =
> wisc_berkeley_tracetime.rID WHERE
> (wisc_berkeley_tracetime.dat = '20000801000000')
> GROUP by ip
> ORDER BY ip;
>
>
> This pattern can be used in any case. I have tested 4
> Queries some of them
> with 1 others even with 2 inner-joins, grouping by ip or min.
> All of them
> work fine.
>
> That was a hard job to figure that out.
> If you still see something what can be improved regarding the
> performance, I
> appreciate to know it.
>
> Best Regards
> Houman M. Khorasani
> University of Wisconsin Platteville
>
>
>
Received on Tuesday, 22 June 2004 23:39:40 UTC