Re: limit per resource rethought... from Andy Seaborne on 2010-08-12 (public-rdf-dawg@w3.org from July to September 2010)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Thu, 12 Aug 2010 18:06:59 +0100
To: Steve Harris <steve.harris@garlik.com>
CC: Axel Polleres <axel.polleres@deri.org>, Paul Gearon <gearon@ieee.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4C642A33.6020309@epimorphics.com>

On 12/08/2010 4:32 PM, Steve Harris wrote:
> On 2010-08-12, at 09:36, Andy Seaborne wrote:
>>
>> On 12/08/10 08:36, Axel Polleres wrote:
>>> The only way I'd see that fit into our current model would be allowing unary SELECT queries as project expressions... something like:
>>
>> Another way would be to allow aggregates to return multiple rows for each key of the group.  Then we can have a (custom) aggregate that returns the top 3 in a group:
>>
>> SELECT ?P top(3, ?name, desc)
>
> That will do funny things to the cardinality, and often still leaves you with some joining up to do in the app. Given:
>
> <a>  :name "foo", "bar", "baz" .
> <b>  :name "qux" .
>
> The results will look like
>
> ?p    ?top
> <a>    "foo"
> <a>    "baz"
> <a>    "bar"
> <b>    "qux"
> [ maybe with
> <b>
> <b>
> depending on exact semantics]
> Personally I would prefer something that returned just one value, with a fixed offset, e.g.
>
> SELECT ?p (SCALAR(1, DESC ?name) AS ?n1) (SCALAR(2, DESC ?name) AS ?n2) (SCALAR(3, DESC ?name) AS ?n3)

clever use of the error condition :-)

>
> So you get
>
> ?p    ?n1    ?n2    ?n3
> <a>    "foo"  "baz"  "bar"
> <b>    "qux"
>
> For most of our uses of this kind of feature, it would be preferable. e.g. find 1-2 alternative names, 1-3 email addresses, and 1-2 postcodes for people called John Smith.

Both cases make sense, as do nested tables.

I have found that the related adjacent rows makes for quite simple 
application processing.  The different names for the same data item also 
has implications (i.e. you don't care about the actual order - just top 3).

Thinking of this as creating an intermediate table that might be 
processed further in the query having a single ?n (repeated) rather than 
?n1 ?n2 ?n3 might be simpler.

There is also the fact that the app might want the top N by one criteria 
(e.g. person's time in current job) but get a different variable (their 
name) [that's a general nested per-row-scoping subquery] means I think 
this space needs some exploration first.

	Andy

Received on Thursday, 12 August 2010 17:07:43 UTC