- From: Babich, Alan <ABabich@filenet.com>
- Date: Fri, 17 Jul 1998 16:15:29 -0700
- To: "'DASL'" <www-webdav-dasl@w3.org>
This is item 10 in Jim's issues list. (1) Just thinking, off the top of your head, you would think you have two possibilities to choose from -- put nulls first in the collation, or put nulls last. That is because you can't think of any logical reason to put null's between, say, strings that start with "A", and strings that start with "B", or between any other two consecutive characters. (You are obviously forced to pick to consecutive characters, because of the ambiguity of where strings starting with an intervening character would go in the ordering.) Anyway, at least I can't. So, you have to boiled it down to two possibilities right off the bat. Now, consider the following two strings: AA AAA This is the order you expect to see the strings in a phone book -- the acronym for Alcoholics Anonymous comes before the acronym for American Automobile Association, right? Well, what is the third character of the first string? It is null, of course. So you can view the comparison as the comparison between two strings of length three. That implies null comes before "A". Similarly, "B" comes before "BB", etc., etc. So, now you have thought of an excellent intuitive and logical reason to decide between putting nulls first or last. You have logically concluded that nulls should be first. (2) Guess what? That is the way the SQL standard does it. In other words, nulls collate before any actual value for all datatypes. (For strings, nulls come before zero length strings too.) This simple and well accepted practice completely specifies the situation. Now you are certain you have a winner. You not only derived the answer from first principles, you agree with the most widely accepted practice and the most widely accepted relevant standard. (3) Making the collating sequence definite is important. For one thing, many people are used to SQL, and they expect it. For another, consider DASL middleware (i.e., software that performs queries across two or more disjoint collections). The middleware will ask each collection to order the results, and perform a multi-list merge on their answer sets to form the final answer set. If the middleware can't depend upon nulls being first in all the answer sets, it can't do a merge -- It has to do a doggone sort. This has potentially serious performance implications. Merging ordered lists requires no secondary storage and is accomplished quickly -- you just scan each list sequentially in parallel, compare the current element in each list, and output the smallest of the current elements and advance the current pointer for that list. Sorting requires N * log2(N) time, and, usually, temporary storage as well. Furthermore, writing a sort routine is a needless barrier of entry to writing the middleware in the first place. (4) Furthermore, there is no downside to adopting the SQL standard approach. (5) Putting nulls first is simple: It is independent of any internationalization issues. (6) Putting nulls first is simple: It is independent of data type. (7) Putting nulls first is easy to implement. If you use SQL in your server, you get it free. (8) I am not aware of any competing standards for putting nulls other than first. I would be surprised if there were any. (Of course, I admit I don't have infinite knowledge of all standards.) Therefore, in view of (1)-(8), I propose we put nulls first in DASL sortby result orderings. Alan Babich
Received on Friday, 17 July 1998 19:18:26 UTC