- From: Leigh Dodds <leigh@ldodds.com>
- Date: Wed, 09 Mar 2005 09:46:57 +0000
- To: Dan Connolly <connolly@w3.org>
- CC: public-rdf-dawg-comments@w3.org, danny.ayers@gmail.com
Dan Connolly wrote: > You're welcome to elaborate on why you think it's important/required. > Use cases are particularly welcome, especially use cases that argue for > handling sorting in SPARQL rather than in a downstream component or > client or XSLT engine or the like. OK, most of my examples will come from the bibliographic domain, as thats the application area I currently work in. We're at present prototyping a replacement for our content storage systems using an RDF triple store and are hoping to use Sparql to query that store. Sorting of results, e.g. articles in a TOC, or issues in a journal, items in a users reading list, have been implemented at both levels: in the query layer, e.g. when using a SQL database; in the application layer, e.g. when sorting criteria are more complex (serial issue release dates, special ordering for supplements, indexes, etc). We recently pushed code back from the application layer into the query, where necessary implementing custom comparators. The results were an improvement in application performance, as well as simplifying the application itself: procedural code to invoke a sort became a declarative aspect of the query. Use cases for sorting in our application include: - retrieve all articles associated with an issue and sort them by page number. - retrieve all issues associated with a journal and sort them by publication date. - retrieve all articles bookmarked by a user and sort them by journal name or date bookmarked. - retrieve all journals within a subject area and sort them by name - retrieve all articles written by an author, and sort them by publication date All of these can be implemented at a higher layer at the cost of implementing custom comparators - one for each data type. The nice aspect of having a query contain all the application criteria (specifying "WHERE" clauses, ordering, limits) is that the query engine has much more information available to it to allow optimisation. E.g. a triple store backed by a relational engine may be able to optimise its queries to use native sorting capabilities. Even manual optimisation becomes easier when the query is self-contained. I note the interaction between LIMIT and ORDER BY, but would argue that LIMIT is unnecessary: I can merely fetch the first n results that I'm interested in. Looking through the queries I'd typically write against a relational store, I find that I'm heavily reliant on ordering and rarely use anything like LIMIT: it's much more likely that I only want the first 10, then the next 10, etc; unless I'm missing something paging isn't possible with LIMIT as specified. Where I do have a use for it, e.g. as in Danny's use case, its much simpler to implement at an application level than sorting. There are systems that support both LIMIT and ORDER BY: search engines. E.g. order by relevance, but just return the first 20. IIRC Google applies its PageRank (a sort) to the first 1000 results or so (a limit). I know of other implementors that have taken similar approaches. It's not "ideal", but I note it as one possible implementation approach. (Aside: I'd also argue that ASK is unnecessary too, as I can merely test for a non-empty result set from a SELECT query; but thats a different thread) Hope thats useful. Cheers, L.
Received on Wednesday, 9 March 2005 09:47:01 UTC