RE: JW5: Partial Results from Jim Whitehead on 1999-08-16 (www-webdav-dasl@w3.org from July to September 1999)

From: Jim Whitehead <ejw@ics.uci.edu>
Date: Mon, 16 Aug 1999 16:35:36 -0700
To: "'DASL'" <www-webdav-dasl@w3.org>
Message-ID: <NDBBIKLAGLCOPGKGADOJEEBMCDAA.ejw@ics.uci.edu>
I think it's important for DASL to support paged results, so let me build
upon what you've written, and throw out a mechanism that might address the
three implementation cases you mention:

> (1) Redo the query each time, each time discarding enough
> initial hits to provide the illusion of progressing
> to the next page in the result set of the original query.
> (2) Get all the hits at once and cache them on the server.
> Return the next page of hits from the cache upon client
> demand. (3) Develop the next page of hits upon demand
> from the client.

Looking at the three implementation strategies, (2) and (3) suggest to me
sending back, along with  the search results, a search identifier (a GUID,
perhaps), a re-request timeout value, and the range of results returned in
this message.  If a client wished further results, it would submit another
SEARCH, this time submitting only the the search identifier, and whether the
client wanted the next N (determined by the limit), or the previous N hits.
The timeout value indicates to the client how long it has before it can no
longer resubmit the query (or, perhaps we could establish an arbitrary upper
bounds of, say, 15 minutes).

Implementation (1) would need to cache the queries, but would not cache the
search results.  It would have a simple hash table going from search
identifier to cached search specification.  If a new query were submitted,
it would redo the query, using the cached search specification.  Att he end
of the timeout period, the search specification would be flushed.

Implementation (2) would cache the query results, associating the entire set
of results with the search identifier.  New searches using a search
identifier would return hits from the cached result set.  At the end of the
timeout period, the cache result set would be flushed.

Implementation (3) would keep a mapping between the search process and the
search identifier, and submit subsequent search requests to the search
process.  At the end of the timeout period, the search process goes away.

There are a couple of ways this can be mapped into syntax once there is
agreement on mechanism.

- Jim

Alan Babich writes:
> Once upon a time, we discussed what we called "paged results".
> Somehow, we dropped the ball, and nothing made it into the spec.
> Oops. Here is what we were thinking. This was in an e-mail I sent
> to Dale Lowry:
>
> In section 14.36 of the HTTP/1.1 spec. (page 128),
> they talk about ranges. The only type of range
> they define is byte ranges. You use a range request
> header that begins "Range:" in the GET method.
>
> Our thought was that we should try to fit in with
> existing HTTP stuff before inventing new stuff,
> and that would maximize our changes of success.
> So, we thought we could probably invent a new
> kind of range for DASL answer sets. Obviously,
> byte range is not what we want. We want a
> granularity of a "hit" or "match set entry",
> so we have to invent a new kind of range.
>
> We think that if the server passes back a
> special string as in your proposal, that the
> server could (a) continue the original query as
> opposed to someone else's query or reexecuting
> the query, and (b) continue returning results
> from the point at which the client left off. We didn't
> discuss jumping around in the result set, either
> backwards or forwards. For the basic feature,
> the client just needs to sweep the answer set once.
> Jumping around in the results is more advanced.
>
> We want to leave it open for the server to be implemented
> using any of the following three implementations, as
> well as other possible implementations:
> (1) Redo the query each time, each time discarding enough
> initial hits to provide the illusion of progressing
> to the next page in the result set of the original query.
> (2) Get all the hits at once and cache them on the server.
> Return the next page of hits from the cache upon client
> demand. (3) Develop the next page of hits upon demand
> from the client.
>
> Implementation (1) is the dumbest, and you may get
> annoying consequences. For example, suppose you did
> a query and the results are supposed to come back
> in increasing order of Author. If there are concurrent
> deletions, then it could happen that the client sees
> a sequence of hits that slips backwards in the
> ascending sequence of Author one or more times,
> because some of the previous hits are repeated. This
> is OK as long as you tell the client that the result
> set entity is different each time. (For example, e-tags
> can tell the client that.) The other two implementations
> don't have this problem.
>
> Implementation (2) means the client has to connect to the
> same cache each time, and that the client has the same
> result set entity each time. The cache has to be garbage
> collected if the client dies before sweeping it. The
> implementation will probably impose some arbitrary upper
> bound on the number of hits to avoid wasting server
> resources.
>
> Implementation (3) means the client has to connect to
> the same server process each time (because that process
> has a database or DMS system handle needed to develop
> the next page of hits). The requesthandler process
> has to time out and go away if the client goes away.
> This implementation may provide the least drain
> on server resources.
>
> We thought that we could probably modify your proposal
> in a way such as to accomplish all of the above
> objectives.
>
> Alan Babich
>
Received on Monday, 16 August 1999 19:42:14 UTC