- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Fri, 11 Mar 2005 13:46:54 +0000
- To: Dan Connolly <connolly@w3.org>
- Cc: DAWG Mailing List <public-rdf-dawg@w3.org>
Dan Connolly wrote:
>
> Recent comments traffic suggests a SORT feature/requirement...
> I asked for a use case, and got...
>
> [[[
> Use case: obtain a given number of most-recent items from a
> triplestore-based RSS aggregator. (Something along the lines of
> http://pubsub.com which aggregates data from several million feeds -
> they use ASN.1 internally btw, though expose XML interfaces).
> ]]]
> --
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/
> 0002.html
>
> Anybody like the idea of addressing that use case in SPARQL?
> Any design ideas or implementation experience with sorting?
>
> Hmm... hard to argue that it's a bad idea; I think the discussion we've
> had is more of the form "it can wait"; i.e. perhaps it belongs on the
> issues list as "postponed".
>
> Thoughts?
>
I think we need to do due diligence on this one. This is not the first
time it has been requested and we need to give a full response.
Putting aside whether we do address the matter in this working group or
postpone, I thought I'd take a pass over what it might involve.
1/ Syntax
Only for SELECT queries.
SQL ORDER BY would be a starting inspiration:
ORDER BY ?x ?y
or
ORDER BY ?x ASC ?y DESC
SQL has comma'ed lists so maybe something a liitle clerer would be:
ORDER BY ASC(?x) DESC(?y)
ORDER BY ?x(ASC) ?y(DESC)
2/ Defining order
RDF is semistructured and unconstrained as the types of values that can
appear - unlike SQL, we can't guarantee a column in the result table is
always the same type so we need to define what happens.
The simplest solution is to make sorting unlike things (that is, no "<"
operation between them) are an error, causing query execution abort with no
results . That is a consistent but rather savage given that not all RDF out
there is well written. Because not every pair needs to be compared in a sort,
it might give rise to strange corner cases.
A variation (more a reframing than a different design) is to only define what
happens in the case of comparable items and leave the rest to implementations.
We could impose an arbitrary ordering amongst unlike things: something like (off
the top of my head):
+ undef (NULL) < bNodes < URIs < literals
+ Literals of unknown type compare on lexical form
unless the processor has additional knowledge.
+ numbers and plain strings compare on lexical form.
Are there other design dimensions?
Andy
Received on Friday, 11 March 2005 13:47:08 UTC