Re: Limiting a query by setting a maximum number of distinct values for a given variable. from Andy Seaborne on 2010-10-19 (semantic-web@w3.org from October 2010)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 19 Oct 2010 14:38:32 +0100
To: Steve Harris <steve.harris@garlik.com>
CC: Olivier Rossel <olivier.rossel@gmail.com>, Alexandre Passant <alexandre.passant@deri.org>, Damian Steer <pldms@mac.com>, Semantic Web <semantic-web@w3.org>
Message-ID: <4CBD9F58.9030905@epimorphics.com>

On 19/10/10 13:55, Steve Harris wrote:
> On 2010-10-19, at 10:31, Olivier Rossel wrote:
>
>> Thanks for all your comments.
>> I am now wondering about a few things and would appreciate your feedback:
>>
>> the use case I present is, in my opinion, very classic.
>> Most UIs dealing with a big set of data display them in a page-based manner.
>>
>> Steve proposes the application layer to handle the mismatch between
>> the LIMIT/OFFSET
>> queries and the page-based UI.
>>
>> If the only usage of the application layer is to refactor result
>> sets'rows into page-based UIs, then
>> an interesting alternative is to include the page-based query feature
>> in the SPARQL spec.
>> Any opinion?
>
> I think it would be hard to specify. That doesn't mean it shouldn't be done of course.
>
> (Arguably) the most obvious fix would be to have some way of writing a top-down subquery, but that has issues around the execution complexity of the resulting query, as I understand it.
>
>> What could be the technical issues of such a proposal?
>> As far as I understand, it is just an alternative way for the DB to
>> count the number of data
>> to return. Not a big deal, it seems, and it would avoid some tricky
>> coding at the application layer.
>>
>>
>> PS: I think SQL does not provide any better alternative, beyond
>> (tedious to write) subqueries. True?
>
> Subqueries in SQL aren't sufficient, they have the same execution rule as SPARQL. You'd need stored procedures, or a complex multi-stage query using temporary tables.
>
> There are tricks that can be done with GROUP_CONCAT, but it's non-standard (in SQL), and only works for some limited cases.

I think a more direct way of addressing the issue is needed.  Rather 
than some idiom, which is beginning to look a bit complicated (examples 
to the contrary welcome), something in the language is needed.  Maybe:

    LIMIT 10 BY ?x

would yield rows upto, and excluding, the first row in which the 11th 
different term for ?x is seen (if enough rows).  Unbound counts as a 
unique term for this operation.

   SELECT DISTINCT(10,?x)

was another language design but this seems to me to be closely related 
to the LIMIT already in SPARQL.

Extending to

    LIMIT 10 BY ?x ?y

for 10 different (?x,?y) pairs seems to be an unnecessary step for now, 
but is not block by the single variable design.

This has been recurring request and seems natural to do in semantic web 
applications.

	Andy

Received on Tuesday, 19 October 2010 13:39:25 UTC