RE: Joins with groupby (No example in W3 Draft) from Priscilla Walmsley on 2004-06-17 (www-ql@w3.org from April to June 2004)

From: Priscilla Walmsley <priscilla@walmsley.com>
Date: Thu, 17 Jun 2004 07:57:17 -0400
To: "'Houman Khorasani'" <khorasani@web.de>, "'Michael Kay'" <mhk@mhk.me.uk>, <www-ql@w3.org>
Message-Id: <20040617115741.4F6D1A0B06@frink.w3.org>
Hi,

It appears that what you want is for any given IP address, you want to take
the first row in the tracelist table that has that IP, join it (by rID) with
the row in the tracetime table that has the minimum "min" value.

What about something like this:

for $distinctIP in
distinct-values(doc("wisc_berkeley_tracelist.xml")//row/ip)
let $firstRowWithThatIP := doc("wisc_berkeley_tracelist.xml")//row[ip =
$distinctIP][1]
let $minMinutes := min(doc("wisc_berkeley_tracetime.xml")//row[dat =
'20000801000000' and rID = $firstRowWithThatIP/rID]/min)
return
	<ROW> 
	{	
		($distinctIP , $firstRowWithThatIP/rID,
<min>{$minMinutes}</min> )
	}
	</ROW>

I haven't tested this so it may not be completely right, but you get the
idea.

Hope that helps,
Priscilla

-----------------------------------------------------
Priscilla Walmsley             priscilla@walmsley.com
Author, Definitive XML Schema     (Prentice Hall PTR)
-----------------------------------------------------  

> -----Original Message-----
> From: www-ql-request@w3.org [mailto:www-ql-request@w3.org] On 
> Behalf Of Houman Khorasani
> Sent: Thursday, June 17, 2004 1:39 AM
> To: 'Michael Kay'; www-ql@w3.org
> Subject: RE: Joins with groupby (No example in W3 Draft)
> Importance: High
> 
> 
> Hello everyone,
> 
> Michael and Donald thank you for your response.
> 
> Well I tried very hard now in the third day in the row to 
> realize the SQL
> like 'Group by' in XQuery.  And I think it is not possible in my case.
> 
> Grouping in XQuery with a for and let clause is meant to be 
> for functions
> like avg(), sum(), round() etc.  These functions must have 
> one value as a
> return but in my case this won't work.
> 
> 
> Let me please explain this with this example:
> 
> http://www.houmie.com/research/pic1.JPG
> 
> On this picture you can see two relational tables:  
> wisc_berkeley_tracelist with two columns ip and rID
> 
> The other table is wisc_berkeley_tracetime with three columns 
> dat, min and
> rID.
> 
> As you can see rID is the join field between the two tables.
> 
> 
> The SQL query is:
> 
> SELECT
> 	wisc_berkeley_tracetime.rID,
> 	ip,
> 	min
> FROM
> 	wisc_berkeley_tracelist INNER JOIN wisc_berkeley_tracetime
> 
> 	ON wisc_berkeley_tracelist.rID = wisc_berkeley_tracetime.rID
> WHERE
> 	(wisc_berkeley_tracetime.dat = '20000801000000')
> 
> 
> 
> The same query in XQuery is:
> 
> for $wbtl in doc("wisc_berkeley_tracelist.xml")//row
> for $wbtt in doc("wisc_berkeley_tracetime.xml")//row[dat = 
> '20000801000000'
> and rID = $wbtl//rID]
> return
> 	<ROW> 
> 	{	
> 		($wbtl/ip, $wbtt/rID, $wbtt/min)
> 	}
> 	</ROW>
> 
> 
> The result of the XQuery statement above is exactly the same 
> as the SQL
> statement:  http://www.houmie.com/research/saxonResult.xml
> 
> These are 275 records.
> 
> You can see the following behavior in the result table of the same SQL
> statement: http://www.houmie.com/research/pic3.JPG
> 
> At the rID 662 and minute 79 you can see several different 
> IPs.  At minute
> 168 the same IP's in the row show up again.  It is possible 
> to group by the
> IP's in SQL (GROUP by ip).  
> 
> What does it mean?  http://www.houmie.com/research/pic2.JPG
> 
> Every IP is now unique and show up just once in the result table.  
> 
> How does it work?  It shows the first unique IP and at what 
> minute it shows
> up.  E.g. the IP 128.32.0.109 shows up for the first time at 
> minute 79.  The
> ip 128.32.0.66 does also show up for the first time at minute 
> 79 and so on.
> The records are reduced from 275 to 30.  
> 
> I tried to solve it in XQuery in this way:
> for $wbtl_ip in 
> distinct-values(doc("wisc_berkeley_tracelist.xml")//ip)
> for $wbtt in doc("wisc_berkeley_tracetime.xml")//row[dat = 
> '20000801000000']
> for $wbtl in doc("wisc_berkeley_tracelist.xml")//row[rID = 
> $wbtt/rID and ip
> = $wbtl_ip]
> 
> return
> 	<ROW> 
> 	{	
> 		($wbtl/ip, $wbtl/rID)
> 	}
> 	</ROW>
> 
> But the result is the same just slower to proceed.  I also 
> tried also an
> outer-join and many other different combinations.
> 
> The main problem lies in distinct-values().  This function 
> can only work on
> atomic-values but not on the nodes it self.  If I could make an
> distinct-value() on the NODE where the ip is unique e.g. ->  
> (the following
> statement is not possible but a fiction)
> 
> for $wbtl_ip in 
> distinct-values(doc("wisc_berkeley_tracelist.xml")//row, ip)
> 
> I would have the right tuples with a unique ip and the 
> appropriate rID which
> can be used for an inner-join on the other table.  This would 
> be more like a
> real SQL join and SQL group by.  Maybe there is a better way 
> to translate
> SQL's 'Group by'.
> 
> Otherwise I argue the above query and similar SQL query 
> statements are not
> possible in XQuery.
> 
> I appreciate any comments.
> 
> Houman Khorasani
> University of Wisconsin Platteville
> 
> 
>
Received on Thursday, 17 June 2004 07:57:43 UTC