Re: limit per resource rethought... from Steve Harris on 2010-08-13 (public-rdf-dawg@w3.org from July to September 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 13 Aug 2010 10:38:52 +0100
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: Axel Polleres <axel.polleres@deri.org>, Paul Gearon <gearon@ieee.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <4F5595E1-A700-4A4D-BF4B-141E81BCD2EB@garlik.com>

On 2010-08-12, at 18:06, Andy Seaborne wrote:
> On 12/08/2010 4:32 PM, Steve Harris wrote:
>> On 2010-08-12, at 09:36, Andy Seaborne wrote:
>>> 
>>> On 12/08/10 08:36, Axel Polleres wrote:
>>>> The only way I'd see that fit into our current model would be allowing unary SELECT queries as project expressions... something like:
>>> 
>>> Another way would be to allow aggregates to return multiple rows for each key of the group.  Then we can have a (custom) aggregate that returns the top 3 in a group:
>>> 
>>> SELECT ?P top(3, ?name, desc)
>> 
>> That will do funny things to the cardinality, and often still leaves you with some joining up to do in the app. Given:
>> 
>> <a>  :name "foo", "bar", "baz" .
>> <b>  :name "qux" .
>> 
>> The results will look like
>> 
>> ?p    ?top
>> <a>    "foo"
>> <a>    "baz"
>> <a>    "bar"
>> <b>    "qux"
>> [ maybe with
>> <b>
>> <b>
>> depending on exact semantics]
>> Personally I would prefer something that returned just one value, with a fixed offset, e.g.
>> 
>> SELECT ?p (SCALAR(1, DESC ?name) AS ?n1) (SCALAR(2, DESC ?name) AS ?n2) (SCALAR(3, DESC ?name) AS ?n3)
> 
> clever use of the error condition :-)

It is?

>> So you get
>> 
>> ?p    ?n1    ?n2    ?n3
>> <a>    "foo"  "baz"  "bar"
>> <b>    "qux"
>> 
>> For most of our uses of this kind of feature, it would be preferable. e.g. find 1-2 alternative names, 1-3 email addresses, and 1-2 postcodes for people called John Smith.
> 
> Both cases make sense, as do nested tables.
> 
> I have found that the related adjacent rows makes for quite simple application processing.  The different names for the same data item also has implications (i.e. you don't care about the actual order - just top 3).
> 
> Thinking of this as creating an intermediate table that might be processed further in the query having a single ?n (repeated) rather than ?n1 ?n2 ?n3 might be simpler.

Yes, especially if you want the top 100. Though I've personally not seen that as a real world requirement, with multiple subjects in one query, yet.

> There is also the fact that the app might want the top N by one criteria (e.g. person's time in current job) but get a different variable (their name) [that's a general nested per-row-scoping subquery] means I think this space needs some exploration first.

Yes.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Friday, 13 August 2010 09:39:28 UTC