- From: Houman Khorasani <khorasani@web.de>
- Date: Thu, 17 Jun 2004 00:38:44 -0500
- To: 'Michael Kay' <mhk@mhk.me.uk>, www-ql@w3.org
Hello everyone, Michael and Donald thank you for your response. Well I tried very hard now in the third day in the row to realize the SQL like 'Group by' in XQuery. And I think it is not possible in my case. Grouping in XQuery with a for and let clause is meant to be for functions like avg(), sum(), round() etc. These functions must have one value as a return but in my case this won't work. Let me please explain this with this example: http://www.houmie.com/research/pic1.JPG On this picture you can see two relational tables: wisc_berkeley_tracelist with two columns ip and rID The other table is wisc_berkeley_tracetime with three columns dat, min and rID. As you can see rID is the join field between the two tables. The SQL query is: SELECT wisc_berkeley_tracetime.rID, ip, min FROM wisc_berkeley_tracelist INNER JOIN wisc_berkeley_tracetime ON wisc_berkeley_tracelist.rID = wisc_berkeley_tracetime.rID WHERE (wisc_berkeley_tracetime.dat = '20000801000000') The same query in XQuery is: for $wbtl in doc("wisc_berkeley_tracelist.xml")//row for $wbtt in doc("wisc_berkeley_tracetime.xml")//row[dat = '20000801000000' and rID = $wbtl//rID] return <ROW> { ($wbtl/ip, $wbtt/rID, $wbtt/min) } </ROW> The result of the XQuery statement above is exactly the same as the SQL statement: http://www.houmie.com/research/saxonResult.xml These are 275 records. You can see the following behavior in the result table of the same SQL statement: http://www.houmie.com/research/pic3.JPG At the rID 662 and minute 79 you can see several different IPs. At minute 168 the same IP's in the row show up again. It is possible to group by the IP's in SQL (GROUP by ip). What does it mean? http://www.houmie.com/research/pic2.JPG Every IP is now unique and show up just once in the result table. How does it work? It shows the first unique IP and at what minute it shows up. E.g. the IP 128.32.0.109 shows up for the first time at minute 79. The ip 128.32.0.66 does also show up for the first time at minute 79 and so on. The records are reduced from 275 to 30. I tried to solve it in XQuery in this way: for $wbtl_ip in distinct-values(doc("wisc_berkeley_tracelist.xml")//ip) for $wbtt in doc("wisc_berkeley_tracetime.xml")//row[dat = '20000801000000'] for $wbtl in doc("wisc_berkeley_tracelist.xml")//row[rID = $wbtt/rID and ip = $wbtl_ip] return <ROW> { ($wbtl/ip, $wbtl/rID) } </ROW> But the result is the same just slower to proceed. I also tried also an outer-join and many other different combinations. The main problem lies in distinct-values(). This function can only work on atomic-values but not on the nodes it self. If I could make an distinct-value() on the NODE where the ip is unique e.g. -> (the following statement is not possible but a fiction) for $wbtl_ip in distinct-values(doc("wisc_berkeley_tracelist.xml")//row, ip) I would have the right tuples with a unique ip and the appropriate rID which can be used for an inner-join on the other table. This would be more like a real SQL join and SQL group by. Maybe there is a better way to translate SQL's 'Group by'. Otherwise I argue the above query and similar SQL query statements are not possible in XQuery. I appreciate any comments. Houman Khorasani University of Wisconsin Platteville
Received on Thursday, 17 June 2004 01:38:52 UTC