- From: Richard Newman <rnewman@franz.com>
- Date: Mon, 25 May 2009 11:19:50 -0700
- To: Mirko <idonthaveenoughinformation@googlemail.com>
- Cc: public-sparql-dev@w3.org
> The results are to large to keep in memory, so I would like to page > them using LIMIT and OFFSET. However it does not work with the above > query. The query above needs all results to be loaded into memory > when evaluating it. I assume this is because more than one statement > is evaluated in the WHERE clause(?). That's not why: it's because you're imposing an order with ORDER BY. There are (broadly speaking) two ways this query could be executed. If a store has an index on my:hasUserID (and that index happens to be in SPARQL's defined order!) then results can be generated in ordered sequence. Successive pages can be generated by re-running the query, skipping more and more results, or somehow holding on to a cursor. It's not enough to just skip userIDs: *rows* must be skipped, so the query does have to be executed in order to skip to the right point. If a store does not have such an index, or your ORDER BY clause is more complicated, then all the results must be gathered in memory to be sorted. There's really no way around that. For a store that doesn't maintain state between queries, generating successive pages in this manner will essentially involve running the whole query each time, returning a different chunk of the results. If you have to sort 100,000 result rows in order to determine the first 1,000, then the second 1,000, you're going to see pretty poor performance. Each query execution will reflect any changes in the store since the last page was generated, which can produce confusing results. > So, how could I page the above query? Do it in your application. That way you also avoid the data changing between pages. I don't think that LIMIT and OFFSET are useful for supporting paging, because the spec does not mandate sufficient efficiency constraints on implementations (such as cursors, as provided by Freebase MQL queries). It's odd to say "you could do it using the method the spec recommends, but you'd be crazy to do so with real datasets". I consider LIMIT's only real use to be for constraining the size of the result set, not defining a page size. IMO it would be much more useful to separate SPARQL execution into two phases: a query that returns a result set, and then operations on the result set (such as serializing slices of it). Conflating the two places the burden of doing paging efficiently onto implementation, and there's no one good solution for all clients. -R
Received on Monday, 25 May 2009 18:20:35 UTC