RE: expressing limits on result set from Babich, Alan on 1998-06-08 (www-webdav-dasl@w3.org from April to June 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Mon, 8 Jun 1998 16:27:59 -0700
To: "'Saveen Reddy (Exchange)'" <saveenr@Exchange.Microsoft.com>, www-webdav-dasl@w3.org
Message-ID: <72B1992276A9D111A20E00805FEAC96DCC40D1@cm-expo1.filenet.com>
(A) I think billing is done out of band. I don't think it is in scope
for DASL. It is a server issue, I think, not a client side or protocol
issue.

(B) Apparently I didn't make myself exactly clear, so I'd like
to try to be clear this time. First, some background, then
the proposals.

First of all, I know for a fact that some document management
systems limit the number of result rows you get back in your
answer set to any query, and that you don't have to do anything
to get this behavior. You just get the behavior, period. 
This is a server side policy that is sometimes used to 
protect the resources available for the other
999 users on the system -- it prevents one user from
either accidentally or intentionally swamping the server 
by submitting an expensive query or queries.

However, I strongly suspect that there are systems that don't 
build in such a limit. So, I do NOT think we should require
this policy to be implemented by collections.

Furthermore, I know for a fact that at least one system develops
results incrementally, at least in some cases. What that means
is that the server gets the next, say, 10 rows, at a time every time
you page forward in the answer set. In this case, there is no limit
on the number of rows you can get, except your patience. And yet, 
since only 10 rows are obtained at a time, the server is never
overloaded.
The human think time in between bursts of 10 is plenty of time to
let the other 999 users get work done. 

On such an incremental scheme, 
my query of "find all documents with document number greater 
than 0" will only find 10 or 20 documents out of the 10 million 
documents on the system, depending on if I hit the page key to 
get the second page of results or not. (Then I get bored and
terminate the query.) No artificial
limit is imposed on the number of result rows I get back.

The results of that query happen to come out in increasing 
document number order on the system in question, not because a 
sort is being performed, but because there just so happens 
to be a B-tree on document number, and it is natural for the 
DBMS to return rows in that order, since it chose that column 
as the way to drive the query. In contrast, when a sort is specified
for the result set, and an actual sort is required, then
all the results have to be obtained and then sorted on
the server before the server returns any results. This would
be intolerable for my loose query (i.e., "document number  > 0")
on a database with 10 million documents.
The main alternative is to let the client retrieve all rows
and then do the sort. This would be even worse in some ways 
for my query on a 10 million document database.
The only other alternative is to not provide arbitrary
sorting of result sets. Therefore, if sorting is involved,
an arbitrary limit may very well be imposed on
the answer set, even in a system that doesn't
impose such a limit on other queries.

The bottom line is this: In order to provide a system
that scales to hundreds of users or more, the system
can never let one user accidentally or intentionally
swamp the server with a query. It has been observed
that humans have limited patience for looking at
result set items. Relatively few humans study as many as
100 result items. Finally, the incremental scheme is
pratical in some RDBMS's, but not feasible in others.
Therefore, I have seen limits in the vicinity of 500
result items or so imposed automatically by some systems.

Now the proposals.

(1) What I was proposing that we think about
providing an OPTIONAL way in the DASL protocol for the user
to give a HINT (not a mandate) about the maximum number 
of result rows that the server should develop for the query.
The server could choose to override its built in
default limit with this value. Or, the server could choose
to ignore the value and do whatever it wants.
This would (on some systems) let serious researchers that have
a need to get a large answer set, AND who have the privileges
to override the default limit (if any), to get a large answer set 
in spite of the server's built in limit. I don't think this hint
is a "need to have in 1.0". I think it is just a "nice to have
in 1.0".

(2) In the real world, limits on answer set sizes are
somteimes imposed automatically by the server.
Whether or not we provide a hint about the
limit on the number of rows in the answer set,
I propose we DOCMENT A WARNING CODE that indicates
to the client that the answer set was truncated
because the answer set size limit was exceeded.
Since the client got all the answer set rows that
it was entitled to, this is not a failure code.
I think this warning code is a "must have" for 1.0.

Alan Babich
> -----Original Message-----
> From:	Saveen Reddy (Exchange) [SMTP:saveenr@Exchange.Microsoft.com]
> Sent:	June 08, 1998 1:49 PM
> To:	www-webdav-dasl@w3.org
> Subject:	RE: expressing limits on result set
> 
> Truncating the result set based on some criteria sounds like a good
> idea.
> Alan pointed out a case in which using such a mechanism allowed a
> client to
> verify that the system was working at all (client issues a query that
> matches all documents but then truncates to a very small result). 
> 
> Which limits do we need to express? Jim pointed out: (1) result set
> size,
> (2) computational effort. Is billing ($) part of this also. I assume
> time is
> the unit of measurement for computational effort. Don't know how to/or
> if we
> should address billing. I would categorize computational effort adn
> billing
> is a "nice-to-have" items and result set size as a "must have".
> 
> -Saveen
Received on Monday, 8 June 1998 19:30:21 UTC