An alternative to explicit revocation? from Jeffrey Mogul on 1996-01-02 (http-caching-historical@w3.org from January 1996)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Tue, 02 Jan 96 15:39:05 PST
To: Ari Luotonen <luotonen@netscape.com>
Cc: http-caching@pa.dec.com (http-caching mailing list)
Message-Id: <9601022339.AA28736@acetes.pa.dec.com>
I have a great deal of interest in the proposals for
explicit revocation (or callbacks, or what have you).  After
all, in a previous life I worked out the details for adding
callback-based caching to NFS (see "Recovery in Spritely NFS",
Computing Systems 7(2):201-262, Spring, 1994, or look at
http://www.research.digital.com/wrl/techreports/abstracts/93.2.html).

But this was not nearly as easy as it might seem.  For example,
the Spritely NFS implementation is about 50% larger than the original
NFS implementation I started with, and there are a few pieces
that I never finished.  Given this experience, and the other
objections raised to callbacks (e.g., firewalls), I do not believe
it is reasonable to try to fit explicit revocation into HTTP/1.1.
Maybe in some later version.

However, one of the concepts that came out of the AFS work (I
believe), called "volume validation", seems like it might go
a long way to improving cache performance for large proxies,
and yet could be implemented without much hassle (I think).

Here's a first cut at a design; please don't hold me to the details,
but I would be interested in comments.

Suppose that the server assigns each resource to one of a number
of sets, which I'll call a "volume."  Volumes do NOT necessarily
map onto storage-hierarchy concepts like disks; they might be
based on file type, for example.

Members of a volume ought to have similar lifetimes.  The server might
assign all of its resources to the same volume, or it might use
a number of volumes to distinguish between (for example), probably
immutable resources, things that change slowly (say, once a week),
things that change often (say, once an hour), and things that
are very dynamic (changing at intervals of seconds or minutes).
If a group of resources is typically changed together, then that
group also forms a natural volume.

When the server returns a response to a cache, it includes the
"usual" cache control info (which, of course, we still have to
argue about) and it also returns these three new header values:

	Volume-ID: <opaque string>
	Volume-version: <opaque-string>
	Volume-expiration: <date> (or maybe <offset in seconds>?)

The cache, in addition to keeping the individual resources, also
keeps a cache of this per-volume information.  Each of the individual
resource cache entries includes a pointer to the associated volume
info (which is managed as a cache, and therefore might not always
be present).  Also, the Volume-version: value is stored with each
individual resource entry.

Note that the per-volume information must also include some
sort of unique ID for the server, such as its IP address or
host name.

Each time the server receives a response from a server, it can
update the per-volume information from that response.  This
allows the server to keep increasing the expiration date for
a volume.  However, if any resource in the volume is modified,
then the server must change the Volume-version: value to one
it has never used before (so this could be a timestamp or sequence
number).

When the cache receives a client request, it would normally
check the Expiration information stored with the relevant
individual-resource entry to decide if it has to reload the
response from the server.  However, if the resource is associated
with a volume that the cache knows about, then it can do this:

	if (volume-version stored with resource matches current
		volume version stored in per-volume entry)
	   then
	   	if (volume-expiration time has not yet been reached)
			then it's OK to return the cached response
		else if (resource expiration time not yet reached)
			then it's OK to return the cached response
		else
		    must do conditional GET from server
	   else
		if (resource expiration time not yet reached)
			then it's OK to return the cached response
		else
		    must do conditional GET from server

In other words, the per-volume information is used to extend
the expiration time for what might be a large set of cached
resources.

Of course, this would only pay off when a proxy caches a sufficiently
large number of resources from the same server (and from the same
volume).  But it lets the server assign relatively short expiration
times to the individual resources (which makes revocation less important),
and still prevent a busy proxy from bombarding it with conditional
requests for resources that haven't been modified.

It's entirely optional for the server or the cache to implement
this, and the implementation at the cache side of things seems
to be pretty simple.  On the server side, implementation complexity
will depend on how the server detects whether a member of a volume
is modified.  A small amount of support from the underlying file
system or database might be quite useful, but in many cases I would
guess that a fairly simple scheme would work.  For example, if
all of the items in a catalog were assigned to the same volume,
the server administrator could simply change the volume-version
value whenever the catalog was updated.

-Jeff
Received on Tuesday, 2 January 1996 23:43:36 UTC