- From: Jerven Bolleman <jerven.bolleman@isb-sib.ch>
- Date: Fri, 13 May 2011 10:01:22 +0200
- To: public-rdf-dawg-comments@w3.org
- Message-ID: <4DCCE552.7000305@isb-sib.ch>
Dear workgroup,
I realized that I might not have been so clear in describing the problem.
Assume that you maintain a publicly available SPARQL endpoint.
You want to support both a HTML view and the official SPARQL formats.
Lets say a user executes the query
SELECT * WHERE {?s ?p ?o}
This will download every triple in your store. In my store this will
mean trying to download 160gb of triples via a single HTTP connection.
This is not likely to work and if it did most browser will crash on the
HTML view.
Therefore I would like to always put a LIMIT on the query to make sure
that the result will match the capabilities of a common HTTP connection.
e.g. default LIMIT 1000
But I do want people to download more than just the first 1000 results
to their query. I just want them to do it in multiple requests that are
likely to complete and not crash their browsers.
So I need pagination i.e. OFFSET. In practical terms this does exactly
what I need (having briefly tested OWLIM and Virtuoso).
i.e. page 1 SELECT * WHERE {?s ?p ?o} OFFSET 0 LIMIT 1000
page 2 SELECT * WHERE {?s ?p ?o} OFFSET 1000 LIMIT 1000
Until there are no more results. However, this is not specified to work
in the current public draft.
Having the following 2 triples in a store.
<_:1> <lala> "hi"
<_:1> <lala> "by"
The following query
SELECT * WHERE {?s ?p ?o}
Can evaluate to either a)
<_:1> <lala> "hi"
<_:1> <lala> "by"
or b)
<_:1> <lala> "by"
<_:1> <lala> "hi"
i.e. ordering is random but all results are returned.
The following query, assume the implementation always returns ordering a)
SELECT * WHERE {?s ?p ?o} OFFSET 0 LIMIT 1
Can return
<_:1> <lala> "hi"
And in the same store it is valid to return this for
SELECT * WHERE {?s ?p ?o} OFFSET 1 LIMIT 1
As well.
So while the chunks are small I am not guaranteed to get all valid
results. I need to add an ORDER BY clause. However, I can't without
changing the query as you can not add ORDER BY *. Nor is this always
desired because ORDER BY actually means that you need to ORDER the
results. This can be very expensive relative to executing the query.
Therefore, I would define OFFSET more specifically.
When a implementation returns a result set for a query. Then it should
do so in a deterministic manner. i.e. executing the same query twice on
a store with constant data will return results in the same order.
The OFFSET parameter is then interpreted as discard the first X results
that a the same query without OFFSET would have generated.
This means that for a query A with N results. The concatenation results
of queries A OFFSET 0..N LIMIT 1 is equal to the result of the query A.
Regards,
Jerven Bolleman
P.S. the original source of this discussion is.
http://answers.semanticweb.com/questions/9456/jena-pagination-for-sparql
On 05/12/2011 04:32 PM, Jerven Bolleman wrote:
> Dear workgroup,
>
> I was recently made aware that there is no easy way to get a guaranteed working pagination.
>
> i.e. QUERY OFFSET 0 LIMIT 5 page 1
> QUERY OFFSET 5 LIMIT 5 page 2
> QUERY OFFSET 10 LIMIT 5 page 3
>
> Without adding an ORDER BY clause. Adding any kind of ORDER BY clause would be enough to ensure pagination worked. I would therefore like to see an ORDER BY * or ORDER BY ANY option. To ensure that the results come in some implementation specific order and that this can be used to show all possible results.
>
> Trying a few public current SPARQL implementations. With ORDER BY * showed that this is currently not implemented. Although pagination with OFFSET and LIMIT without an ORDER BY clause seems to work as a naive user (e.g. me) would expect. Meaning that for current SPARQL implementers it is no work at all other than dealing with a slightly different SPARQL grammar.
>
> Pagination guaranteed to succeed would then be
>
> i.e. QUERY OFFSET 0 LIMIT 5 ORDER BY ANY page 1
> QUERY OFFSET 5 LIMIT 5 ORDER BY ANY page 2
> QUERY OFFSET 10 LIMIT 5 ORDER BY ANY page 3
>
> The other option is to expand the description of the OFFSET clause. For example the use of the OFFSET clause should guarantee that query results come back in a consistent order.
>
> I hope this concern makes sense.
>
> Regards,
> Jerven
>
>
Received on Friday, 13 May 2011 08:02:01 UTC