W3C home > Mailing lists > Public > www-webdav-dasl@w3.org > April to June 2000

RE: Order By

From: Babich, Alan <ABabich@filenet.com>
Date: Fri, 9 Jun 2000 13:11:19 -0700
Message-ID: <C3AF5E329E21D2119C4C00805F6FF58F0398E967@hq-expo2.filenet.com>
To: "'Tapas Nayak'" <tapasnay@Exchange.Microsoft.com>, www-webdav-dasl@w3.org
Thanks for reminding me about merging results across multiple repositories.
That's a very key issue.
 
On the first release of DASL, the DASL spec. does not specify enough to
accomplish merging
the results across multiple repositories. You want to make searching across
multiple
repositories transparent, so you want one metadata space across all the
repositories. 
This leaves only two choices for merging the metadata of the multiple
repositories involved: 
(1) intersect the metadata of all the repositories, or 
(2) take the set union of the metadata of all the repositories. 
The client would have to specify which behavior he wanted. 
DMA has already addressed this issue in excruciating detail, and great
success has 
resulted.  (Trust me, there was a lot of detail involved in specifying it
for DMA.)
In DMA, you can query seamlessly across multiple different repositories that
have 
different metadata. Users absolutely love this.
 
My strong opinion is that, IF THERE EVER IS A DASL, THEN SOMEDAY DASL WILL
QUERY ACROSS
MULTIPLE REPOSITORIES. I feel that is inevitable. (The caps. are just for
visual emphasis in
as ASCII-only e-mail world, not "shouting".)
 
Based on my personal strong belief about that, I believe option (3) (i.e.,
let the implementation
decide) is not a viable choice. You would have to invent a way to advertise
which way nulls
are sorted, and, probably, which way the client wants them sorted. That's
bad enough by itself. 
But when you consider the performance impact (e.g., if one of the
repositories
involved in the query sorts nulls first, and one of the repositories sorts
nulls last), 
then you see that you have to wait for all the results from the one of the
repositories or the other 
(depending upon whether the merged stream of results sorts nulls first or
last) before you can 
return the FIRST result to the end user -- you can NOT return sorted results
to the end user 
incrementally. That absolutely kills the idea of alternative (3). What you
must do to serve 
the end users adequately is to merge the results incrementally as they come
in from the
repositories, and present the results to the end user as soon as possible.
This requires that all 
the repositories sort nulls the same way.
 
Therefore, only alternative (1) and (2) are viable in a world where cross
repository searches are
performed.
 
Alternative (2) is illogical, since it violates the principle of least
surprise. Putting the zero length
strings at the front and the nulls at the end would confuse even me for a at
least a moment.
 
This only leaves us with alternative (1), which is what's currently in the
DASL spec.
 
Alan Babich

 -----Original Message-----
From: Tapas Nayak [mailto:tapasnay@Exchange.Microsoft.com]
Sent: Friday, June 09, 2000 12:56 AM
To: Babich, Alan; Kevin Wiggen; www-webdav-dasl@w3.org
Subject: RE: Order By



It may be restrictive to prescribe one way or the other by 
interpreting nulls with one interpretation. NULLs have 
different interpretations in different semantic contexts. 
While for strings a NULL may be considered as a NULL string, in 
many cases NULL represent values not known or dont care. For an integer 
field, say age, it may be incorrect to consider NULL as 0, for example. 
It is this lack of a unique concrete interpretation of NULL in all
situations 
that left its interpretation to the implementation in SQL spec, I think. 
And probably that is the appropriate thing to do here too. Thus option 3
seems to be 
the appropriate thing. I understand, however, that they will pose a problem
in collation 
of results from more than one data sources, if the participating data
sources chose 
different implementations for the same property. That would indicate
possibly that 
even though it is the same property, different providers have taken slightly
different 
semantics for it. In that case either collator will have to do some
reconciliation 
of these different interpretations or enforce one semantics over the other. 

Regards 
Tapas 
-----Original Message----- 
From: Babich, Alan [ mailto:ABabich@filenet.com <mailto:ABabich@filenet.com>
] 
Sent: Thursday, June 08, 2000 6:43 PM 
To: 'Kevin Wiggen'; www-webdav-dasl@w3.org 
Subject: RE: Order By 


Does anybody else out there have an opinion? 

The choices are: 

(1) Nulls sort first 
(2) Nulls sort last 
(3) Nulls sort first or last at the implementation's choice 

These three choices are compatible with the SQL spec., and 
make logical sense. Only choice (1) is compatible with DMA. 

Consider how zero length strings sort: 
"A" sorts before "AA", because it is a shorter string. 
(This is not only specified in the SQL spec., 
it is the way things are listed in the phone book.) 
Therefore, the zero length string, "", sorts before "A" 
and before all other strings with length greater than zero. 

A null string is similar to a zero length string since 
neither contains any characters, so it seems odd to me 
to put null strings and zero length strings at opposite ends 
of the sort. I would think doing that could confuse some end users 
who are not sophisticated in the subtleties of zero length 
strings versus null strings. It also forces the implementation 
to split the semantic hair of whether zero length strings are 
null or not. I would prefer not to risk confusing users or 
implementers. In other words I prefer choice (1), and I 
claim that choices (2) and (3) have some risk of confusing 
some end users and implementers because of the similarity of 
null strings and zero length strings. 

(BTW, I agree with you, Kevin, that postgresql seems to be 
violating the SQL spec. Nulls must always sort first or last, 
so DESC must put nulls at the opposite end. In postgresql, 
as you have shown, DESC doesn't do that.) 

Alan 

-----Original Message----- 
From: Kevin Wiggen [ mailto:wiggs@xythos.com <mailto:wiggs@xythos.com> ] 
Sent: Thursday, June 08, 2000 4:42 PM 
To: Kevin Wiggen; Babich, Alan; Jim Davis; www-webdav-dasl@w3.org 
Subject: RE: Order By 



OK I will clarify (not that anyone cares) 

Oracle 

order by column1 => nulls sort last 
order by column1 DESC => nulls sort first 

Postgresql 

order by column1 => nulls sort last 
order by column1 DESC => nulls sort last 

I could argue that either one makes sense to me (although postgresql seems 
to be violating the spec).... 

Kevin 


-----Original Message----- 
From: www-webdav-dasl-request@w3.org 
[ mailto:www-webdav-dasl-request@w3.org
<mailto:www-webdav-dasl-request@w3.org> ]On Behalf Of Kevin Wiggen 
Sent: Thursday, June 08, 2000 4:03 PM 
To: Babich, Alan; Jim Davis; www-webdav-dasl@w3.org 
Subject: RE: Order By 




Well Oracle and Postgresql (the only db's I have around, I called a friend 
to get Informix but he's not around), both sort the nulls last.  This makes 
sense to me, as if it doesn't exist, then you probably don't care about it, 
thus it sorts last. 

So I guess I object to the spec.  I would like to see it changed so that 
nulls sort last. 

Kevin 

-----Original Message----- 
From: Babich, Alan [ mailto:ABabich@filenet.com <mailto:ABabich@filenet.com>
] 
Sent: Wednesday, June 07, 2000 8:50 PM 
To: 'Kevin Wiggen'; Jim Davis; www-webdav-dasl@w3.org 
Subject: RE: Order By 


I just looked it up. I thought the SQL spec. required nulls to sort first. 
Here's what the version of the SQL spec. that I have actually says: 

ANSI X3.135-1992 (Database Language SQL) in section 13.1, General Rule 3)b),

page 309, says "Whether a sort key value that is null is considered greater 
or 
less than a non-null value is implementation-defined, but all sort key 
values that 
are null shall either be considered greater than all non-null values or be 
considered 
less than all non-null values." 

Last time I used Oracle (which was a while ago) it sorted nulls first. 
To me, logically, null is even less significant than zero, so it seems right

to me 
that nulls should come first. I seem to remember that DMA query requires 
nulls to sort first. 

I personally think it's better for everybody to do things the same way 
(that's why we have specs., 
isn't it?). 

We made it definite in DASL, and made nulls sort first. 
Any significant objections, or are there just comments but not a true 
objection? 


Alan Babich 

-----Original Message----- 
From: Kevin Wiggen [ mailto:wiggs@xythos.com <mailto:wiggs@xythos.com> ] 
Sent: Wednesday, June 07, 2000 3:26 PM 
To: Jim Davis; www-webdav-dasl@w3.org 
Subject: RE: Order By 



After further review :) 

The fact that nulls sort first????, if the value is null it is probably 
because the property isn't defined and therefore you could care less, but 
these are the ones sorted first???  The spec acts as though this is common 
in ANSI standard SQL, but I have only seen the opposite from every DB I have

ever worked with (which maybe means none of them follow the standard) 

Kevin 


-----Original Message----- 
From: Jim Davis [ mailto:jrd3@alum.mit.edu <mailto:jrd3@alum.mit.edu> ] 
Sent: Wednesday, June 07, 2000 7:47 AM 
To: Kevin Wiggen; www-webdav-dasl@w3.org 
Subject: Re: Order By 


At 10:16 PM 6/6/00 -0700, Kevin Wiggen wrote: 
> 
> 
>Does adding "Order By Content Length" (or any other file specific property)

>automatically add "AND Resource Type = File" to the Where clause of a Dasl 
>Query? 
> 
>Or in English, does adding a file specific property to the orderby clause 
in 
>a dasl query implicitly make all resources that are returned files?  or do 
>the directories and null resources simply sort last? 

The latter.   Nowhere does  the spec say that anything is added 
(implicitly) to the where clause, and I would never agree to that.  But it 
does say  "In the context of the DAV:orderby element, null values are 
considered to collate before any actual (i.e., non null) value, including 
strings of zero length".  (5.6 in the most recent draft) 

Is this a problem either technically or editorially? 

regards 

Jim 
Received on Friday, 9 June 2000 16:14:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 22 March 2009 03:38:05 GMT