RE: New "order by" clause

At 02:03 PM 11/19/2002 -0700, Jim Davies wrote:

>I wrote:
>
>I don't see it stated explicitly, but I suspect that the motivation for
>going to "order by" is to allow easy mapping of XQuery  expressions onto an
>underlying relational database.  Would you say that this is the case?
>
>Jonathan responds:
>
>Not only onto relational databases, but also indexes in native XML
>databases. FWIW, I have worked for three different companies
>that do native  XML databases, and I care about implementability on both
>native XML and  relational stores.
>
>My response:
>
>I, too, am currently working on a native XML database.  In our 
>implementation,
>the sorting facility doesn't rely on indexing, so I guess I don't see why 
>there
>should be a connection.  Won't any XQuery implementation need to have
>a general-purpose sorting facility (for either "sort by" or "order 
>by")?  And if
>that is the case, aren't we just arguing about syntax?

Hi Jim,

I don't think so, but before I get into this, let me be clear that several 
things are going on in this exchange. I'm trying to gain a complete 
understanding of your feedback - and the feedback of Bas - so I know what 
you are saying. The Query WG will have to decide how to process this input, 
and I'm not sure what it will decide.

At the same time, I am trying to argue for the current design to see how 
you respond. Opinions expressed here are mine, and don't reflect the view 
of the WG. If I disagree with you, I will still want to make sure your 
opinion is heard.

>Consider this degenerate case:
>
>         (<word>this</word>, <word>that</word>) sort by(.)
>
>If you express this as
>
>         for $x in (<word>this</word>, <word>that</word>)
>         order by ($x)
>         return $x
>
>then an implementation won't be able to use indexing to perform the sort 
>anyway.
>And if you don't support this particular usage, are you supporting the 
>entire XQuery
>language?

Certainly this would be supported, and it could not rely on an index. The 
goal is not to require indexes, but to be able to exploit them when they 
are present. Basically, it boils down to this: if people did all sorting in 
the expressions in for clauses, and not on the output of a FLWR expression, 
the result was optimizable. If people did sorting based on data found 
within elements constructed in the return clause, the result was generally 
not optimizable.

Sorting does not require indexes, but being able to use indexes to support 
interesting orders is pretty important for efficient implementation, 
whether relational or native XML.

>I wrote:
>
>And is that a good rationale for designing a language feature?
>
>Jonathan responds:
>Efficient implementability in the range of environments where
>we expect our  language to be deployed? Yes, I would say that's a good 
>rationale.
>
>My response:
>
>See the above example.  Not all expressions can be ordered by using indexes.

True - but many can, and we should make it easy for implementations to do so.

>I wrote:
>I found "sortby" (or "sort by") to be intuitively pretty simple, and
>more general than "order by".  The latter complicates simple
>queries; I can't say
>
>         document("mystuff.xml")//name sortby(.)
>
>Jonathan responds:
>
>Right, you would say:
>
>for $n in document("mystuff.xml")//name
>order by $n/name
>return $n
>
>My response:
>
>I agree that it's possible.  What I don't agree is that this is a
>"concise and easily understood" way to express this.   It is more
>familiar to a SQL programmer, perhaps, if that's a design goal.

Familiarity to SQL programmers is not a goal for me. XQuery is not SQL.

>Jonathan writes:
>
>This was discussed at length, and we examined quite a few queries along
>these lines. One proposal was to retain both sort by () and  the order by
>clause, defining the one in terms of the other. When the vote  came, people
>seemed to prefer having only one way to sort.
>
>Let me make sure I understand your position - would you
>prefer having two, or would you prefer that we remove 'order by' altogether?
>
>My response:
>
>I would prefer the removal of "order by".  I think it's less general than 
>"sort by",
>more verbose and harder to understand in simple cases, and doesn't add any
>functionality (unless I've overlooked something, you can always apply a 
>separate
>"sort by" to each "for" expression in nested FLWR statements to get the 
>same result
>as a multi-level "order by"; or you can sort the output of the FLWR, which 
>you can't with
>"order by").

As far as I know, there's no difference in generality, any query that can 
be expressed with sort by () can be expressed with the order by clause, and 
vice versa. The purpose really is efficient implementability in a variety 
of environments.

>It is possible that my opinion is colored by the fact that my company has 
>a released
>product that implements "sortby" (not "sort by", that wasn't in the August 
>draft),
>and that I don't want to have to re-work it (or, especially, to rewrite 
>the manuals :-).
>
>But I'm not convinced that making XQuery more SQL-like really serves any 
>higher design
>purpose; this language is fundamentally based around sequences, not tuples 
>created by
>joins, and sorting a sequence seems very natural.  Forcing the programmer 
>to iterate over
>the sequence, when all he wants to do is to sort it, does not.

Making XQuery more SQL-like isn't an interesting goal. I agree that XQuery 
is about sequences - but iteration and order are closely related, and I 
don't find it wrong to combine them the way we have in the 'order by' clause.

Jonathan

Received on Tuesday, 19 November 2002 16:21:54 UTC