Proposal for Paged Search Results

PROPOSAL FOR PAGED RESULTS
-----------------------------------------------

Paged results processing allows a client to begin displaying results for the user without waiting for the entire query to be completed by the server. The need for paged results becomes significant in situations where query complexity causes the server to expend a large amount of time to calculate all of the results. On some systems this complexity threshold may be crossed with seemingly simple requests such as the query "find all documents created by Fred between 1970 and 1980." Every search engine has their strengths and weaknesses, which will not be known to a non-proprietary client.

ASSUMPTIONS/REQUIREMENTS:

1) The caller specifies the number of hits desired. This number can change each time more hits are requested.
2) Navigation of results can only occur once and traverses from the beginning to the end of the results--i.e., no virtual list handling.
3) The server should return the total number of hits that it knows the query satisfies. This number can change each time the caller gets new hits.
4) The server can time-out and throw away any context information about the query. If the server doesn't want to save any context then it can timeout immediately. 
5) A timed out query should automatically resume if more hits are requested.
6) Sortby and limit XML elements as currently defined in DASL still apply. 
7) We cannot change the DTD for the multistatus and response XML elements.

DESIGN:

If the caller wants paged results returned from the server, the initial query results must include the pagedresults XML element with the limit indicating the number of hits desired. Each time a subsequent request is made for more hits, the caller must provide the original searchrequest in addition to a context XML element and position XML element as provided by the previous response. The original search request is provided so the server can recreate the search if it has timed out the previous search request.

The context identifier should include enough information for the server to identify the context information that is stored on the server to provide the results of the requested query. The position identifier should include enough information for the server to locate the current position in the results set for this query. This identifier should also be able to locate the current position if the query is reissued. This could be a document ID, a hit count, a database record number, etc...

The caller must not change the context or position string that is returned in the query response. These values must remain the same when making the next query request.

Even though paged results introduces context maintained by the server, this doesn't necessarily mean that the system will be less scalable. The server can, for example, partially complete the query, send the partial results back to the caller, finish the query by streaming results to file on the server for later access and then release in-memory context associated with that user and the query. Additionally, the server may choose to reissue the search each time more results are requested, then navigate to the position last returned to the caller before returning results. Any number of approaches may be implemented at the server to address scalability in a way that fits the architecture of that system. Systems that are scalable and handle paged results well should not be hindered by less robust systems by not providing paged results handling in the DASL protocol.

Responses from a paged results request will be different than a non-paged results request only in the content of the response-i.e., response code returned will be the same and have the same meaning for either type of request. If there are more results to be paged to the caller then the server will respond with a 425(Insufficient Space on Resource) response and the body will contain paged results context information in a pagedresultsresponse XML element. If there are no more results to be returned to the caller the server will respond with a 207(multi-status) response and the body will contain the remaining results generated. If the server responds with a 207 this implies that context of any form including results sets on the server have been freed.


EXAMPLE INITIAL REQUEST:

<?xml version="1.0"?>
<?xml:namespace ns="DAV:" prefix="D"?>
<?xml:namespace ns="FOO:" prefix="F"?>
<D:searchrequest>
     <d:simplesearch>
          <d:select>
               <d:prop><d:getcontentlength/></d:prop>
          </d:select>
          <d:from>
               <d:scope>
                    <d:href>/container1/</d:href>
                    <d:depth>infinity</d:depth>
               </d:scope>
          </d:from>
          <d:pagedresults>
               <d:limit>
                    <d:nresults>1</d:nresults>
               </d:limit>
          </d:pagedresults>
     </d:simplesearch>
</D:searchrequest>

EXAMPLE SUBSEQUENT REQUEST:

<?xml version="1.0"?>
<?xml:namespace ns="DAV:" prefix="D"?>
<?xml:namespace ns="FOO:" prefix="F"?>
<D:searchrequest>
     <d:simplesearch>
          <d:select>
               <d:prop><d:getcontentlength/></d:prop>
          </d:select>
          <d:from>
               <d:scope>
                    <d:href>/container1/</d:href>
                    <d:depth>infinity</d:depth>
               </d:scope>
          </d:from>
          <d:pagedresults>
               <d:limit>
                    <d:nresults>1</d:nresults>
               </d:limit>
               <d:context>##SearchA##</d:context>
               <d:position>Pos100</d:position>
          </d:pagedresults>
     </d:simplesearch>
</D:searchrequest>


EXAMPLE PARTIAL RESPONSE:

HTTP/1.1 425 Insufficient Space on Resource       
Content-Type: text/xml
Content-Length: xxx

<?xml version="1.0"?>
<?xml:namespace ns="DAV:" prefix="D"?>
<?xml:namespace ns="http://ryu.com/propschema" prefix="R"?>
<D:pagedresultsresponse>
     <d:context>##SearchA##</d:context>
     <d:position>Pos101</d:position>
     <d:total>1105</d:total>
     <D:multistatus>
          <D:response>
               <D:href>http://siamiam.com</D:href>
               <D:propstat>
                    <D:prop>
                         <R:location>259 W. Hollywood</R:location>
                         <R:rating><R:stars>4</R:stars></R:rating>
                    </D:prop>
                    <D:status>HTTP/1.1 200 OK</D:status>
               </D:propstat>
          </D:response>
     </D:multistatus>
</D:pagedresultsresponse>

EXAMPLE FINAL RESPONSE:

HTTP/1.1 207 Multi-Status
Content-Type: text/xml
Content-Length: xxx

<?xml version="1.0"?>
<?xml:namespace ns="DAV:" prefix="D"?>
<?xml:namespace ns="http://ryu.com/propschema" prefix="R"?>
<D:multistatus>
     <D:response>
          <D:href>http://siamiam.com</D:href>
          <D:propstat>
               <D:prop>
                    <R:location>259 W. Hollywood</R:location>
                    <R:rating><R:stars>4</R:stars></R:rating>
               </D:prop>
               <D:status>HTTP/1.1 200 OK</D:status>
          </D:propstat>
     </D:response>
</D:multistatus>


DTD CHANGES/REUSE:

<!ELEMENT simplesearch (select, from, where?, sortby?, limit?, pagedresults?)>
<!ELEMENT limit        (nresults)>
<!ELEMENT nresults     (#PCDATA)>


DTD ADDITIONS:

<!ELEMENT pagedresults (limit, context?, positition?)>
<!ELEMENT context      (#PCDATA)>
<!ELEMENT position     (#PCDATA)>
<!ELEMENT total        (nresults)>
<!ELEMENT pagedresultsresponse     (context, position, total, multistatus)> 

Name:         pagedresults
Namespace:    DAV
Purpose:      Defines how the caller wants the results partially returned rather than receiving all results in a single response.

Name:         pagedresultsresponse
Namespace:    DAV
Purpose:      Response from a query where paged results were requested and there can be more results returned.

Name:         limit
Namespace:    DAV
Purpose:      Used as a child of DAV:pagedresults scope to indicate how much effort the server should expend before returning partial results to the caller.
     Used as a child of DAV:simplesearch to indicate how much effort the server should expend for the entire query before terminating the search.

Name:         nresults
Namespace:    DAV
Purpose:      Indicates the number of results returned from the query. 

Name:         context
Namespace:    DAV
Purpose:      A string that the server provides, which allows the server to continue returning results from the same query that was originally issued. If this element is not provided then the query should be started or restarted by the server.

Name:         position
Namespace:    DAV
Purpose:      A string that the server provides, which allows the server to know the position in the results set where it should start to return new results to the caller. If this element is not provided then the query should be started or restarted by the server.

Name:         total
Namespace:    DAV
Purpose:      Indicates the total effort that the server will expend to complete the query. This element is returned from the server.



ISSUES:

1) How do we want to change QSD with respect to this functionality?


Dale A. Lowry ( dlowry@novell.com )
Novell GroupWise Document Management
801-222-4662

Received on Friday, 31 July 1998 15:45:37 UTC