Re: limit per resource rethought... from Steve Harris on 2010-08-13 (public-rdf-dawg@w3.org from July to September 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 13 Aug 2010 11:22:45 +0100
To: Axel Polleres <axel.polleres@deri.org>
Cc: "Andy Seaborne" <andy.seaborne@epimorphics.com>, "Paul Gearon" <gearon@ieee.org>, "SPARQL Working Group" <public-rdf-dawg@w3.org>
Message-Id: <C3FE22FB-F46D-46F3-8BEE-A896153D7BC5@garlik.com>

On 2010-08-13, at 09:59, Axel Polleres wrote:

>> Personally I would prefer something that returned just one value, with a fixed offset, e.g.
>> 
>> SELECT ?p (SCALAR(1, DESC ?name) AS ?n1) (SCALAR(2, DESC ?name) AS ?n2) (SCALAR(3, DESC ?name) AS ?n3)
>> So you get
>> 
>> ?p    ?n1    ?n2    ?n3
>> <a>   "foo"  "baz"  "bar"
>> <b>   "qux"
> 
> *Personally* (= </chair>) , I like 
> 
>> ?p    ?top
>> <a>   "foo"
>> <a>   "baz"
>> <a>   "bar"
>> <b>   "qux"
> 
> better (at least that reflects more the intention I had in mind with the original query)
> 
> Also, in your approach, for varying 'n' I need to put n times "(SCALAR(1, DESC ?name) AS ?n1)"
> which is a lot like my earlier ugly version with OPTIONAL
> whereas with 
> 
> SELECT ?P
>      ({ SELECT ?P1 WHERE { ?P :knows ?P1 } ORDER BY ?P1 LIMIT n } AS ?F )
> WHERE {?P a :Person}
> 
> a la Paul, I'd just have to change 'n' as a parameter of LIMIT in the query. Admittedly, 
> I find that quite appealing, whereas I am not really sure whether inventing new 
> aggregates solves this or similar queries adequately. :-|

The advantage of returning it "horizontally" comes when you want to limit more than one variable.

> Just for comparison... this is how that would work in SQL ... assume I have the 
> knows relation in a table KNOWS(A,B):
> 
> SELECT A,B FROM KNOWS k1 WHERE B IN (SELECT B FROM KNOWS k2 WHERE k1.A=k2.A LIMIT n);

That kind of subquery isn't really equivalent, but SQL subqueries work bottom-up, like SPARQL:

[example from mysql]

SELECT * FROM foo;
+------+------+
| a    | b    |
+------+------+
|    1 |    2 | 
|    1 |    3 | 
|    2 |   10 | 
+------+------+

SELECT * FROM bar;
+------+------+
| a    | b    |
+------+------+
|    1 |    4 | 
|    1 |    5 | 
|    2 |   11 | 
|    2 |   12 | 
+------+------+

SELECT * FROM foo JOIN (SELECT * FROM bar LIMIT 1) AS bars ON foo.a=bars.a;
+------+------+------+------+
| a    | b    | a    | b    |
+------+------+------+------+
|    1 |    2 |    1 |    4 | 
|    1 |    3 |    1 |    4 | 
+------+------+------+------+

Your query - though not really equivalent to a SPARQL-style subquery, and not legal SQL :) would return something similar. The subquery is effectively executed "first", then used to constrain the outer query.

If I wanted to do what we're discussing here in MySQL, I would use GROUP_CONCAT. Though MySQL's GROUP_CONCAT has no easy way to limit the number of results. We could add limit=N to ours if we wished to make this use-case easier to serve.

Probably in the ideal world it would be useful to be able to return "array" types, but I think that might complicate things, and we're already biting off rather more than we can chew, IMHO.

So, rather than (SCALAR(1, DESC ?name) AS ?n1) ... it could be (SEQUENCE(?name; limit=3) AS ?names) or similar:

SELECT ?p (SEQUENCE(?name; limit=3) AS ?names)
WHERE {
   ?p :knows ?f .
   ?f :name ?name .
} GROUP BY ?p ORDER BY ?name 

N.B. I'm not sure the ORDER BY will affect the results of the SEQUENCE() aggregate offhand, I'd have to check the algebra more carefully. Packing the ordering into the aggregate expressions would be tricky. MySQL does it with a keyword: GROUP_CONCAT(x, y ORDER BY y).

You could substitute GROUP_CONCAT for the SEQUENCE above, and just have some more (un-)escaping work to do, without messing with the more complex XPath datatypes we've avoided so far.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Friday, 13 August 2010 10:23:20 UTC