- From: Babich, Alan <ABabich@filenet.com>
- Date: Mon, 8 Jun 1998 16:27:59 -0700
- To: "'Saveen Reddy (Exchange)'" <saveenr@Exchange.Microsoft.com>, www-webdav-dasl@w3.org
(A) I think billing is done out of band. I don't think it is in scope for DASL. It is a server issue, I think, not a client side or protocol issue. (B) Apparently I didn't make myself exactly clear, so I'd like to try to be clear this time. First, some background, then the proposals. First of all, I know for a fact that some document management systems limit the number of result rows you get back in your answer set to any query, and that you don't have to do anything to get this behavior. You just get the behavior, period. This is a server side policy that is sometimes used to protect the resources available for the other 999 users on the system -- it prevents one user from either accidentally or intentionally swamping the server by submitting an expensive query or queries. However, I strongly suspect that there are systems that don't build in such a limit. So, I do NOT think we should require this policy to be implemented by collections. Furthermore, I know for a fact that at least one system develops results incrementally, at least in some cases. What that means is that the server gets the next, say, 10 rows, at a time every time you page forward in the answer set. In this case, there is no limit on the number of rows you can get, except your patience. And yet, since only 10 rows are obtained at a time, the server is never overloaded. The human think time in between bursts of 10 is plenty of time to let the other 999 users get work done. On such an incremental scheme, my query of "find all documents with document number greater than 0" will only find 10 or 20 documents out of the 10 million documents on the system, depending on if I hit the page key to get the second page of results or not. (Then I get bored and terminate the query.) No artificial limit is imposed on the number of result rows I get back. The results of that query happen to come out in increasing document number order on the system in question, not because a sort is being performed, but because there just so happens to be a B-tree on document number, and it is natural for the DBMS to return rows in that order, since it chose that column as the way to drive the query. In contrast, when a sort is specified for the result set, and an actual sort is required, then all the results have to be obtained and then sorted on the server before the server returns any results. This would be intolerable for my loose query (i.e., "document number > 0") on a database with 10 million documents. The main alternative is to let the client retrieve all rows and then do the sort. This would be even worse in some ways for my query on a 10 million document database. The only other alternative is to not provide arbitrary sorting of result sets. Therefore, if sorting is involved, an arbitrary limit may very well be imposed on the answer set, even in a system that doesn't impose such a limit on other queries. The bottom line is this: In order to provide a system that scales to hundreds of users or more, the system can never let one user accidentally or intentionally swamp the server with a query. It has been observed that humans have limited patience for looking at result set items. Relatively few humans study as many as 100 result items. Finally, the incremental scheme is pratical in some RDBMS's, but not feasible in others. Therefore, I have seen limits in the vicinity of 500 result items or so imposed automatically by some systems. Now the proposals. (1) What I was proposing that we think about providing an OPTIONAL way in the DASL protocol for the user to give a HINT (not a mandate) about the maximum number of result rows that the server should develop for the query. The server could choose to override its built in default limit with this value. Or, the server could choose to ignore the value and do whatever it wants. This would (on some systems) let serious researchers that have a need to get a large answer set, AND who have the privileges to override the default limit (if any), to get a large answer set in spite of the server's built in limit. I don't think this hint is a "need to have in 1.0". I think it is just a "nice to have in 1.0". (2) In the real world, limits on answer set sizes are somteimes imposed automatically by the server. Whether or not we provide a hint about the limit on the number of rows in the answer set, I propose we DOCMENT A WARNING CODE that indicates to the client that the answer set was truncated because the answer set size limit was exceeded. Since the client got all the answer set rows that it was entitled to, this is not a failure code. I think this warning code is a "must have" for 1.0. Alan Babich > -----Original Message----- > From: Saveen Reddy (Exchange) [SMTP:saveenr@Exchange.Microsoft.com] > Sent: June 08, 1998 1:49 PM > To: www-webdav-dasl@w3.org > Subject: RE: expressing limits on result set > > Truncating the result set based on some criteria sounds like a good > idea. > Alan pointed out a case in which using such a mechanism allowed a > client to > verify that the system was working at all (client issues a query that > matches all documents but then truncates to a very small result). > > Which limits do we need to express? Jim pointed out: (1) result set > size, > (2) computational effort. Is billing ($) part of this also. I assume > time is > the unit of measurement for computational effort. Don't know how to/or > if we > should address billing. I would categorize computational effort adn > billing > is a "nice-to-have" items and result set size as a "must have". > > -Saveen
Received on Monday, 8 June 1998 19:30:21 UTC