Caching and callbacks

It seems that the address for discussions in the RFC is not working.

This is a MIME-encapsulated message

--DAD28622.841491670/netmail.austin.ibm.com

The original message was received at Wed, 21 Aug 1996 06:14:17 -0500
from lyle.austin.ibm.com [129.35.176.158]

   ----- The following addresses had delivery problems -----
<http-wg@cuckoo.hpl.hp.com>  (unrecoverable error)

   ----- Transcript of session follows -----
<http-wg@cuckoo.hpl.hp.com>... Deferred: Connection timed out during initial connection with hplms2.hpl.hp.com.
Message could not be delivered for 1 week, 3 days
Message will be deleted from queue

   ----- Original message follows -----

--DAD28622.841491670/netmail.austin.ibm.com
Content-Type: message/rfc822

Return-Path: peterson@austin.ibm.com
Received: from lyle.austin.ibm.com (lyle.austin.ibm.com [129.35.176.158]) by netmail.austin.ibm.com (8.6.12/8.6.11) with SMTP id GAA50368 for <http-wg@cuckoo.hpl.hp.com>; Wed, 21 Aug 1996 06:14:17 -0500
Received: by lyle.austin.ibm.com (AIX 3.2/UCB 5.64/4.03-client-2.5)
          for  at austin.ibm.com; id AA24661; Tue, 20 Aug 1996 17:35:43 -0500
Date: Tue, 20 Aug 1996 17:35:43 -0500
From: peterson@austin.ibm.com (James L. Peterson)
Message-Id: <9608202235.AA24661@lyle.austin.ibm.com>
To: http-wg@cuckoo.hpl.hp.com
Subject: Caching and callbacks

We are trying to implement caching of Web pages in regional proxies.

The current scheme, as I understand it, requires that every reference
to a page be checked to see if it is out-of-date.  Since a page may
change at any time (but probably won't), we assume that a reference by
a client to the page in the proxy will require the proxy to check with
the server to see if the page is out of date (either with a HEAD
request or a GET if-modified-since request).

It appears to me that this is very similar to the NFS file system
design.  Our initial work on the Andrew File System found that for
both NFS and the initial AFS, that the vast majority of messages is
checking to see if something is out-of-date when it isn't (for a file
system this was a stat() request).  My memory suggests that 85% to 95%
of the messages were stat requests.

There are two disadvantages here: (1) the network traffic, and (2) the
latency until the client gets the correct information.  If the file
does need to be updated, the round-trip time from the proxy to the
server is unavoidable, but in the vast majority of the cases, the file
is not changed and the round-trip time is wasted.

The solution for the Andrew File System was to redesign it to be based
on callbacks.  A user of a file caches it locally and registers with
the server for a callback.  If, or when, the file is changed at the
server, the server then sends messages to all registered clients
indicating that their local copy is out-of-date.  The local client can
then either retrieve an up-to-date copy, or simply delete the old copy
from its cache.

We would like to propose that a similar callback scheme be allowed in
web pages using http.

A new request, or modification of the GET request, would be provided
which asks the server for an object and also asks to be notified if
the object changes.  The serve may respond with the object and a
notification of "no callback support" (the current situation) or may
accept the callback request.

The server maintains a list of callbacks for each page.  If it finds
that the page has changed, it then notifies each callback with a
message containing the URL of the page that changed.

The client can then either update its local cached copy or simply
throw it away.

Our objective would be to use this for caching in proxies.  The client
would request a page from the proxy.  If the proxy has the page, it
would return it.  If not it would request the page from the server,
and request callback if it changed.  Further requests for the page
would return the proxy cached copy.  If the page changes, the server
would notify the proxy, who would invalidate its local cached copy.

There are a number of issues.  The server would accept a callback request
for one change.  When a change occurs, it would step through its list and
notify each requester, removing them from its callback list.  If the
requester wants to be notified of future changes, it would need to 
request callback again (presumably when it fetched the updated page).
It would be expected that a number of the requesters may no longer have
that page in their cache and would simply ignore the notification of
change and not re-register.

If the list of callbacks at the server becomes too long, it has 
several options: (1) refuse new requests for callback, (2) remove
the oldest request from the list (LRU) and send it a message 
indicating that its callback has been canceled, (3) grow the list
to accept the new callback.

The one major flaw that I see is that the server may fail and the
callback list may then be lost.  In this case the requesters may be
expecting to be notified when, in fact, they have been dropped from
the callback list (this is another option for what to do if the
callback list gets too long -- just throw some entries away but we did
not propose it since it seems to invalidate the point of the callback
list).  Accordingly, it would seem that the requester will have to, at
intervals, check if all three of the following are true: (a) the
server and page still exists, (b) the page has not been modified since
it was cached, and (c) we are still on the callback list.


jim

--DAD28622.841491670/netmail.austin.ibm.com--

Received on Tuesday, 3 September 1996 09:35:16 UTC