W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > October to December 2011

Pre-LC comments consolidation

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 07 Oct 2011 09:33:52 +0100
Message-ID: <4E8EB970.1030309@epimorphics.com>
To: SPARQL Working Group <public-rdf-dawg@w3.org>
The message below
[[
Re: Fwd: COMMENT? Re: Pagination in SPARQL OFFSET and LIMIT needs ORDER BY
]]
is in reference to comments "JBolleman-1" and "JBolleman-2".  There is a 
drafted response but it has not been sent, as far as I can see from a 
search of the archives.

http://www.w3.org/2009/sparql/wiki/CommentResponse:JBolleman-1



There are pre-LC comments still open (15).  A few (2) have drafts and 
response may have been sent (I have not had the time to check the 
archives). 2 are marked postponed.

I extracted the pre-LC comments that were not noted as response sent 
(please check I've done this right).

http://www.w3.org/2009/sparql/wiki/Comments#Pre_LC-Comments_.28processing.29

This does not include the LC and post-LC comments.


	Andy

-------- Original Message --------
Subject: COMMENT? Re: Pagination in SPARQL OFFSET and LIMIT needs ORDER BY
Resent-Date: Thu, 06 Oct 2011 20:10:36 +0000
Resent-From: public-rdf-dawg-comments@w3.org
Date: Thu, 6 Oct 2011 22:09:54 +0200
From: Jerven Bolleman <jerven.bolleman@isb-sib.ch>
To: jerven.bolleman@isb-sib.ch

Dear Workgroup,

I have not seen any discussion of my suggestion the OFFSET works 
deterministically. Has this been discussed in the workgroup and been 
disregarded as a bad idea or breaking compatibility? Or has my 
comment/question slipped through the cracks? Or is it now to late as 
July 29th has passed?

Regards,
Jerven



On May 13, 2011, at 10:01 AM, Jerven Bolleman wrote:

> Dear workgroup,
>
> I realized that I might not have been so clear in describing the problem.
>
> Assume that you maintain a publicly available SPARQL endpoint.
> You want to support both a HTML view and the official SPARQL formats.
>
> Lets say a user executes the query
> SELECT * WHERE {?s ?p ?o}
> This will download every triple in your store. In my store this will mean trying to download 160gb of triples via a single HTTP connection.
> This is not likely to work and if it did most browser will crash on the HTML view.
>
> Therefore I would like to always put a LIMIT on the query to make sure that the result will match the capabilities of a common HTTP connection.
> e.g. default LIMIT 1000
>
> But I do want people to download more than just the first 1000 results to their query. I just want them to do it in multiple requests that are likely to complete and not crash their browsers.
>
> So I need pagination i.e. OFFSET. In practical terms this does exactly what I need (having briefly tested OWLIM and Virtuoso).
> i.e. page 1 SELECT * WHERE {?s ?p ?o} OFFSET 0 LIMIT 1000
>     page 2 SELECT * WHERE {?s ?p ?o} OFFSET 1000 LIMIT 1000
> Until there are no more results. However, this is not specified to work in the current public draft.
>
> Having the following 2 triples in a store.
> <_:1> <lala> "hi"
> <_:1> <lala> "by"
>
> The following query
> SELECT * WHERE {?s ?p ?o}
> Can evaluate to either a)
> <_:1> <lala> "hi"
> <_:1> <lala> "by"
> or b)
> <_:1> <lala> "by"
> <_:1> <lala> "hi"
>
> i.e. ordering is random but all results are returned.
>
> The following query, assume the implementation always returns ordering a)
>
> SELECT * WHERE {?s ?p ?o} OFFSET 0 LIMIT 1
>
> Can return
> <_:1> <lala> "hi"
> And in the same store it is valid to return this for
> SELECT * WHERE {?s ?p ?o} OFFSET 1 LIMIT 1
> As well.
>
> So while the chunks are small I am not guaranteed to get all valid results. I need to add an ORDER BY clause. However, I can't without changing the query as you can not add ORDER BY *. Nor is this always desired because ORDER BY actually means that you need to ORDER the results. This can be very expensive relative to executing the query.
>
> Therefore, I would define OFFSET more specifically.
>
> When a implementation returns a result set for a query. Then it should do so in a deterministic manner. i.e. executing the same query twice on a store with constant data will return results in the same order.
> The OFFSET parameter is then interpreted as discard the first X results that a the same query without OFFSET would have generated.
>
> This means that for a query A with N results. The concatenation results of queries A OFFSET 0..N LIMIT 1 is equal to the result of the query A.
>
> Regards,
> Jerven Bolleman
>
> P.S. the original source of this discussion is.
> http://answers.semanticweb.com/questions/9456/jena-pagination-for-sparql
>
>
> On 05/12/2011 04:32 PM, Jerven Bolleman wrote:
>> Dear workgroup,
>>
>> I was recently made aware that there is no easy way to get a guaranteed working pagination.
>>
>> i.e. QUERY OFFSET 0 LIMIT 5 page 1
>>       QUERY OFFSET 5 LIMIT 5 page 2
>>       QUERY OFFSET 10 LIMIT 5 page 3
>>
>> Without adding an ORDER BY clause. Adding any kind of ORDER BY clause would be enough to ensure pagination worked. I would therefore like to see an  ORDER BY * or ORDER BY ANY option. To ensure that the results come in some implementation specific order and that this can be used to show all possible results.
>>
>> Trying a few public current SPARQL implementations. With ORDER BY * showed that this is currently not implemented. Although pagination with OFFSET and LIMIT without an ORDER BY clause  seems to work as a naive user (e.g. me) would expect. Meaning that for current SPARQL implementers it is no work at all other than dealing with a slightly different SPARQL grammar.
>>
>> Pagination guaranteed to succeed would then be
>>
>> i.e. QUERY OFFSET 0 LIMIT 5 ORDER BY ANY page 1
>>       QUERY OFFSET 5 LIMIT 5 ORDER BY ANY page 2
>>       QUERY OFFSET 10 LIMIT 5 ORDER BY ANY page 3
>>
>> The other option is to expand the description of the OFFSET clause. For example the use of the OFFSET clause should guarantee that query results come back in a consistent order.
>>
>> I hope this concern makes sense.
>>
>> Regards,
>> Jerven
>>
>>
>
> <jerven_bolleman.vcf>
Received on Friday, 7 October 2011 08:34:33 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:46 GMT