Re: N most-recent news items, a use case motivating a SORT requirement from Seaborne, Andy on 2005-03-11 (public-rdf-dawg@w3.org from January to March 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 11 Mar 2005 13:46:54 +0000
To: Dan Connolly <connolly@w3.org>
Cc: DAWG Mailing List <public-rdf-dawg@w3.org>
Message-ID: <4231A14E.1020205@hp.com>

Dan Connolly wrote:
 >
 > Recent comments traffic suggests a SORT feature/requirement...
 > I asked for a use case, and got...
 >
 > [[[
 > Use case: obtain a given number of most-recent items from a
 > triplestore-based RSS aggregator. (Something along the lines of
 > http://pubsub.com which aggregates data from several million feeds -
 > they use ASN.1 internally btw, though expose XML interfaces).
 > ]]]
 >   --
 > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/
 > 0002.html
 >
 > Anybody like the idea of addressing that use case in SPARQL?
 > Any design ideas or implementation experience with sorting?
 >
 > Hmm... hard to argue that it's a bad idea; I think the discussion we've
 > had is more of the form "it can wait"; i.e. perhaps it belongs on the
 > issues list as "postponed".
 >
 > Thoughts?
 >

I think we need to do due diligence on this one.  This is not the first
time it has been requested and we need to give a full response.

Putting aside whether we do address the matter in this working group or 
postpone, I thought I'd take a pass over what it might involve.

1/ Syntax

Only for SELECT queries.

SQL ORDER BY would be a starting inspiration:

   ORDER BY ?x ?y
or
   ORDER BY ?x ASC ?y DESC

SQL has comma'ed lists so maybe something a liitle clerer would be:

   ORDER BY ASC(?x) DESC(?y)
   ORDER BY ?x(ASC) ?y(DESC)

2/ Defining order

RDF is semistructured and unconstrained as the types of values that can
appear - unlike SQL, we can't guarantee a column in the result table is
always the same type so we need to define what happens.

The simplest solution is to make sorting unlike things (that is, no "<" 
operation between them) are an error, causing query execution abort with no 
results .  That is a consistent but rather savage given that not all RDF out 
there is well written.  Because not every pair needs to be compared in a sort, 
it might give rise to strange corner cases.

A variation (more a reframing than a different design) is to only define what 
happens in the case of comparable items and leave the rest to implementations.

We could impose an arbitrary ordering amongst unlike things: something like (off 
the top of my head):

+  undef (NULL) < bNodes < URIs < literals

+  Literals of unknown type compare on lexical form
    unless the processor has additional knowledge.

+  numbers and plain strings compare on lexical form.

Are there other design dimensions?

	Andy

Received on Friday, 11 March 2005 13:47:08 UTC