RE: N most-recent news items, a use case motivating a SORT requir ement from Howard Katz on 2005-03-17 (public-rdf-dawg@w3.org from January to March 2005)

From: Howard Katz <howardk@fatdog.com>
Date: Wed, 16 Mar 2005 16:41:04 -0800
To: "Seaborne, Andy" <andy.seaborne@hp.com>, "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>
Cc: "'Dan Connolly '" <connolly@w3.org>, "'DAWG Mailing List '" <public-rdf-dawg@w3.org>
Message-ID: <JMEJKDCPGHHIANHOMPDBCEKCCLAA.howardk@fatdog.com>
I don't know if XQuery provides any useful lessons here or not. One
interesting difference is that sorting in XQuery is indirect: the items in
the sequence aren't sorted themselves; rather stand-in "orderspec" proxies
are used instead. This is kind of cool and provides a lot of flexibility. In
any event we don't have to worry about it since Sparql doesn't do this. No
point in being jealous. :-)

The first step in sorting the orderspecs is that they're atomized. If an
item is an atomic value, it's returned itself. Non-atomic items such as
nodes are converted to their typed equivalents. If there's no schema present
to provide typing information, they become items of type xs:string. Once
everything's been atomized, the items are converted to the lowest common
denominator type that has a gt operator using subtype substitution and type
promotion. (I believe we do the fomer?)

integers and decimals for example are compared as decimals (subtype
substitution), while strings and URIs are compared as strings (URL type
promotion).

If things turn out not to be comparable under the above rules, a type error
is raised.

fwiw,
Howard

 > -----Original Message-----
 > From: public-rdf-dawg-request@w3.org
 > [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Seaborne, Andy
 > Sent: Wednesday, March 16, 2005 1:51 PM
 > To: Thompson, Bryan B.
 > Cc: 'Dan Connolly '; 'DAWG Mailing List '
 > Subject: Re: N most-recent news items, a use case motivating a SORT
 > requir ement
 >
 >
 >
 > Thompson, Bryan B. wrote:
 > > Andy,
 > >
 > > Some APIs, e.g., Sesame for one, already require the ability to impose
 > > a total ordering.  While this is outside of the model theory, Sort is
 > > not really a model theory issue - it is a data use issue.  If a total
 > > ordering is desired, I would suggest:
 > >
 > >   URIs < literals < bnodes < undef
 > >
 > > so that the more human consumable stuff comes first.
 > >
 > > -bryan
 >
 > I spent some time looking at what SQL does in sorting result
 > sets with nulls
 > in columns to be ordered.
 >
 > SQL:2003 does not specify the ordering but says that NULL equals
 > NULL for
 > sorting and either they should be higher or lower (not mixed in).
 >
 > The different databases do different things.  Some sort NULLs
 > higher than
 > any value, some lower; some seem to depend on whether it's ASC
 > or DESC; one
 > has a settable flag to control whether NULLs should be higher or lower.
 >
 > The nearest to a decision criteria I can think of is that bNodes
 > and nulls
 > are closer to "" (hence sort low) in users expectations.  Hardly the
 > strongest of reasons.  Aligning with the str() function seems OK
 > but isn't
 > necessary.
 >
 > The URI < plain strings (or the other way round) will catch
 > new-to-RFf-query
 > people out but really it is better than pretending they are remotely the
 > same kind of thing.
 >
 > 	Andy
 >
 > >
 > > -----Original Message-----
 > > From: public-rdf-dawg-request@w3.org
 > > To: Dan Connolly
 > > Cc: DAWG Mailing List
 > > Sent: 3/11/2005 8:47 AM
 > > Subject: Re: N most-recent news items, a use case motivating a SORT
 > > requirement
 > >
 > >
 > > Dan Connolly wrote:
 > >  >
 > >  > Recent comments traffic suggests a SORT feature/requirement...
 > >  > I asked for a use case, and got...
 > >  >
 > >  > [[[
 > >  > Use case: obtain a given number of most-recent items from a
 > >  > triplestore-based RSS aggregator. (Something along the lines of
 > >  > http://pubsub.com which aggregates data from several million feeds -
 > >  > they use ASN.1 internally btw, though expose XML interfaces).
 > >  > ]]]
 > >  >   --
 > >  >
 > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/
 > >  > 0002.html
 > >  >
 > >  > Anybody like the idea of addressing that use case in SPARQL?
 > >  > Any design ideas or implementation experience with sorting?
 > >  >
 > >  > Hmm... hard to argue that it's a bad idea; I think the discussion
 > > we've
 > >  > had is more of the form "it can wait"; i.e. perhaps it
 > belongs on the
 > >  > issues list as "postponed".
 > >  >
 > >  > Thoughts?
 > >  >
 > >
 > > I think we need to do due diligence on this one.  This is not the first
 > > time it has been requested and we need to give a full response.
 > >
 > >
 > > Putting aside whether we do address the matter in this working
 > group or
 > > postpone, I thought I'd take a pass over what it might involve.
 > >
 > > 1/ Syntax
 > >
 > > Only for SELECT queries.
 > >
 > >
 > > SQL ORDER BY would be a starting inspiration:
 > >
 > >    ORDER BY ?x ?y
 > > or
 > >    ORDER BY ?x ASC ?y DESC
 > >
 > > SQL has comma'ed lists so maybe something a liitle clerer would be:
 > >
 > >    ORDER BY ASC(?x) DESC(?y)
 > >    ORDER BY ?x(ASC) ?y(DESC)
 > >
 > >
 > > 2/ Defining order
 > >
 > > RDF is semistructured and unconstrained as the types of values that can
 > > appear - unlike SQL, we can't guarantee a column in the result table is
 > > always the same type so we need to define what happens.
 > >
 > > The simplest solution is to make sorting unlike things (that
 > is, no "<"
 > > operation between them) are an error, causing query execution
 > abort with
 > > no
 > > results .  That is a consistent but rather savage given that
 > not all RDF
 > > out
 > > there is well written.  Because not every pair needs to be
 > compared in a
 > > sort,
 > > it might give rise to strange corner cases.
 > >
 > > A variation (more a reframing than a different design) is to
 > only define
 > > what
 > > happens in the case of comparable items and leave the rest to
 > > implementations.
 > >
 > >
 > > We could impose an arbitrary ordering amongst unlike things: something
 > > like (off
 > > the top of my head):
 > >
 > > +  undef (NULL) < bNodes < URIs < literals
 > >
 > > +  Literals of unknown type compare on lexical form
 > >     unless the processor has additional knowledge.
 > >
 > > +  numbers and plain strings compare on lexical form.
 > >
 > > Are there other design dimensions?
 > >
 > > 	Andy
 > >
 >
 >
Received on Thursday, 17 March 2005 00:41:21 UTC