Re: N most-recent news items, a use case motivating a SORT requir ement from Seaborne, Andy on 2005-03-16 (public-rdf-dawg@w3.org from January to March 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 16 Mar 2005 21:50:41 +0000
To: "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>
Cc: 'Dan Connolly ' <connolly@w3.org>, 'DAWG Mailing List ' <public-rdf-dawg@w3.org>
Message-ID: <4238AA31.1070205@hp.com>
Thompson, Bryan B. wrote:
> Andy,
> 
> Some APIs, e.g., Sesame for one, already require the ability to impose
> a total ordering.  While this is outside of the model theory, Sort is
> not really a model theory issue - it is a data use issue.  If a total
> ordering is desired, I would suggest:
> 
>   URIs < literals < bnodes < undef
> 
> so that the more human consumable stuff comes first.
> 
> -bryan

I spent some time looking at what SQL does in sorting result sets with nulls 
in columns to be ordered.

SQL:2003 does not specify the ordering but says that NULL equals NULL for 
sorting and either they should be higher or lower (not mixed in).

The different databases do different things.  Some sort NULLs higher than 
any value, some lower; some seem to depend on whether it's ASC or DESC; one 
has a settable flag to control whether NULLs should be higher or lower.

The nearest to a decision criteria I can think of is that bNodes and nulls 
are closer to "" (hence sort low) in users expectations.  Hardly the 
strongest of reasons.  Aligning with the str() function seems OK but isn't 
necessary.

The URI < plain strings (or the other way round) will catch new-to-RFf-query 
people out but really it is better than pretending they are remotely the 
same kind of thing.

	Andy

> 
> -----Original Message-----
> From: public-rdf-dawg-request@w3.org
> To: Dan Connolly
> Cc: DAWG Mailing List
> Sent: 3/11/2005 8:47 AM
> Subject: Re: N most-recent news items, a use case motivating a SORT
> requirement
> 
> 
> Dan Connolly wrote:
>  >
>  > Recent comments traffic suggests a SORT feature/requirement...
>  > I asked for a use case, and got...
>  >
>  > [[[
>  > Use case: obtain a given number of most-recent items from a
>  > triplestore-based RSS aggregator. (Something along the lines of
>  > http://pubsub.com which aggregates data from several million feeds -
>  > they use ASN.1 internally btw, though expose XML interfaces).
>  > ]]]
>  >   --
>  > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/
>  > 0002.html
>  >
>  > Anybody like the idea of addressing that use case in SPARQL?
>  > Any design ideas or implementation experience with sorting?
>  >
>  > Hmm... hard to argue that it's a bad idea; I think the discussion
> we've
>  > had is more of the form "it can wait"; i.e. perhaps it belongs on the
>  > issues list as "postponed".
>  >
>  > Thoughts?
>  >
> 
> I think we need to do due diligence on this one.  This is not the first
> time it has been requested and we need to give a full response.
> 
> 
> Putting aside whether we do address the matter in this working group or 
> postpone, I thought I'd take a pass over what it might involve.
> 
> 1/ Syntax
> 
> Only for SELECT queries.
> 
> 
> SQL ORDER BY would be a starting inspiration:
> 
>    ORDER BY ?x ?y
> or
>    ORDER BY ?x ASC ?y DESC
> 
> SQL has comma'ed lists so maybe something a liitle clerer would be:
> 
>    ORDER BY ASC(?x) DESC(?y)
>    ORDER BY ?x(ASC) ?y(DESC)
> 
> 
> 2/ Defining order
> 
> RDF is semistructured and unconstrained as the types of values that can
> appear - unlike SQL, we can't guarantee a column in the result table is
> always the same type so we need to define what happens.
> 
> The simplest solution is to make sorting unlike things (that is, no "<" 
> operation between them) are an error, causing query execution abort with
> no 
> results .  That is a consistent but rather savage given that not all RDF
> out 
> there is well written.  Because not every pair needs to be compared in a
> sort, 
> it might give rise to strange corner cases.
> 
> A variation (more a reframing than a different design) is to only define
> what 
> happens in the case of comparable items and leave the rest to
> implementations.
> 
> 
> We could impose an arbitrary ordering amongst unlike things: something
> like (off 
> the top of my head):
> 
> +  undef (NULL) < bNodes < URIs < literals
> 
> +  Literals of unknown type compare on lexical form
>     unless the processor has additional knowledge.
> 
> +  numbers and plain strings compare on lexical form.
> 
> Are there other design dimensions?
> 
> 	Andy
>
Received on Wednesday, 16 March 2005 21:51:18 UTC