W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2005

Re: N most-recent news items, a use case motivating a SORT requirement

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 11 Mar 2005 13:46:54 +0000
Message-ID: <4231A14E.1020205@hp.com>
To: Dan Connolly <connolly@w3.org>
Cc: DAWG Mailing List <public-rdf-dawg@w3.org>

Dan Connolly wrote:
 >
 > Recent comments traffic suggests a SORT feature/requirement...
 > I asked for a use case, and got...
 >
 > [[[
 > Use case: obtain a given number of most-recent items from a
 > triplestore-based RSS aggregator. (Something along the lines of
 > http://pubsub.com which aggregates data from several million feeds -
 > they use ASN.1 internally btw, though expose XML interfaces).
 > ]]]
 >   --
 > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/
 > 0002.html
 >
 > Anybody like the idea of addressing that use case in SPARQL?
 > Any design ideas or implementation experience with sorting?
 >
 > Hmm... hard to argue that it's a bad idea; I think the discussion we've
 > had is more of the form "it can wait"; i.e. perhaps it belongs on the
 > issues list as "postponed".
 >
 > Thoughts?
 >

I think we need to do due diligence on this one.  This is not the first
time it has been requested and we need to give a full response.


Putting aside whether we do address the matter in this working group or 
postpone, I thought I'd take a pass over what it might involve.

1/ Syntax

Only for SELECT queries.


SQL ORDER BY would be a starting inspiration:

   ORDER BY ?x ?y
or
   ORDER BY ?x ASC ?y DESC

SQL has comma'ed lists so maybe something a liitle clerer would be:

   ORDER BY ASC(?x) DESC(?y)
   ORDER BY ?x(ASC) ?y(DESC)


2/ Defining order

RDF is semistructured and unconstrained as the types of values that can
appear - unlike SQL, we can't guarantee a column in the result table is
always the same type so we need to define what happens.

The simplest solution is to make sorting unlike things (that is, no "<" 
operation between them) are an error, causing query execution abort with no 
results .  That is a consistent but rather savage given that not all RDF out 
there is well written.  Because not every pair needs to be compared in a sort, 
it might give rise to strange corner cases.

A variation (more a reframing than a different design) is to only define what 
happens in the case of comparable items and leave the rest to implementations.


We could impose an arbitrary ordering amongst unlike things: something like (off 
the top of my head):

+  undef (NULL) < bNodes < URIs < literals

+  Literals of unknown type compare on lexical form
    unless the processor has additional knowledge.

+  numbers and plain strings compare on lexical form.

Are there other design dimensions?

	Andy
Received on Friday, 11 March 2005 13:47:08 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:22 GMT