- From: Michael Kay <mhk@mhk.me.uk>
- Date: Wed, 23 Jun 2004 04:19:58 +0100
- To: "'Houman Khorasani'" <khorasani@web.de>, "'Priscilla Walmsley'" <priscilla@walmsley.com>, <www-ql@w3.org>
Obviously performance depends on the optimizations done by your particular query engine, but here are some suggestions to place less reliance on the optimizer: (a) use specific paths rather than $x//xyz. The "//" operator is always expensive as it involves searching a whole subtree. (b) the expression $temp[ip=$distinctIp] appears three times: assign the result to a variable. (c) the expression "for $i in $temp[ip=$distinctIp] return $i//rID" can almost certainly be rewritten as "$temp[ip=$distinctIp]//rID" Michael Kay > -----Original Message----- > From: www-ql-request@w3.org [mailto:www-ql-request@w3.org] On > Behalf Of Houman Khorasani > Sent: 22 June 2004 20:40 > To: 'Priscilla Walmsley'; www-ql@w3.org > Subject: RE: Joins with groupby (No example in W3 Draft) > Importance: High > > > Hi Priscilla, > > Thank you very much, for your help. Actually it didn't work > but I didn't > give up and after keep trying over many days I could solve it > at the end. :) > Maybe there is still a way to improve the performance of this > query. If you > see something so please tell me to change and try that. > > There is just one solution for 'Group By' what does exactly > the same what > SQL 'Group By' does: > > ---------------------------------------- > > let $temp := for $wbtt in > doc("wisc_berkeley_tracetime.xml")//row[dat = > '20000801000000'] > for $wbtl in > doc("wisc_berkeley_tracelist.xml")//row[rID = > $wbtt/rID] > return <g>{($wbtl/rID, $wbtl/ip, $wbtt/min, > $wbtt/dat)}</g> > > for $distinctIp in distinct-values($temp//ip) > > let $rID := for $i in $temp[ip=$distinctIp] > return $i//rID > > let $dat := for $i in $temp[ip=$distinctIp] > return $i//dat > > let $min := for $i in $temp[ip=$distinctIp] > return $i//min > > order by $distinctIp > > return > <ROW> > { $rID[1] } > <ip>{ $distinctIp }</ip> > { $dat[1] } > { $min[1] } > </ROW> > > -------------------------------------- > This is equivalent to: > > SELECT > wisc_berkeley_tracetime.rID, > ip, > dat, > min > FROM > wisc_berkeley_tracelist INNER JOIN wisc_berkeley_tracetime > ON wisc_berkeley_tracelist.rID = > wisc_berkeley_tracetime.rID WHERE > (wisc_berkeley_tracetime.dat = '20000801000000') > GROUP by ip > ORDER BY ip; > > > This pattern can be used in any case. I have tested 4 > Queries some of them > with 1 others even with 2 inner-joins, grouping by ip or min. > All of them > work fine. > > That was a hard job to figure that out. > If you still see something what can be improved regarding the > performance, I > appreciate to know it. > > Best Regards > Houman M. Khorasani > University of Wisconsin Platteville > > >
Received on Tuesday, 22 June 2004 23:39:40 UTC