- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Tue, 02 Jan 96 15:39:05 PST
- To: Ari Luotonen <luotonen@netscape.com>
- Cc: http-caching@pa.dec.com (http-caching mailing list)
I have a great deal of interest in the proposals for explicit revocation (or callbacks, or what have you). After all, in a previous life I worked out the details for adding callback-based caching to NFS (see "Recovery in Spritely NFS", Computing Systems 7(2):201-262, Spring, 1994, or look at http://www.research.digital.com/wrl/techreports/abstracts/93.2.html). But this was not nearly as easy as it might seem. For example, the Spritely NFS implementation is about 50% larger than the original NFS implementation I started with, and there are a few pieces that I never finished. Given this experience, and the other objections raised to callbacks (e.g., firewalls), I do not believe it is reasonable to try to fit explicit revocation into HTTP/1.1. Maybe in some later version. However, one of the concepts that came out of the AFS work (I believe), called "volume validation", seems like it might go a long way to improving cache performance for large proxies, and yet could be implemented without much hassle (I think). Here's a first cut at a design; please don't hold me to the details, but I would be interested in comments. Suppose that the server assigns each resource to one of a number of sets, which I'll call a "volume." Volumes do NOT necessarily map onto storage-hierarchy concepts like disks; they might be based on file type, for example. Members of a volume ought to have similar lifetimes. The server might assign all of its resources to the same volume, or it might use a number of volumes to distinguish between (for example), probably immutable resources, things that change slowly (say, once a week), things that change often (say, once an hour), and things that are very dynamic (changing at intervals of seconds or minutes). If a group of resources is typically changed together, then that group also forms a natural volume. When the server returns a response to a cache, it includes the "usual" cache control info (which, of course, we still have to argue about) and it also returns these three new header values: Volume-ID: <opaque string> Volume-version: <opaque-string> Volume-expiration: <date> (or maybe <offset in seconds>?) The cache, in addition to keeping the individual resources, also keeps a cache of this per-volume information. Each of the individual resource cache entries includes a pointer to the associated volume info (which is managed as a cache, and therefore might not always be present). Also, the Volume-version: value is stored with each individual resource entry. Note that the per-volume information must also include some sort of unique ID for the server, such as its IP address or host name. Each time the server receives a response from a server, it can update the per-volume information from that response. This allows the server to keep increasing the expiration date for a volume. However, if any resource in the volume is modified, then the server must change the Volume-version: value to one it has never used before (so this could be a timestamp or sequence number). When the cache receives a client request, it would normally check the Expiration information stored with the relevant individual-resource entry to decide if it has to reload the response from the server. However, if the resource is associated with a volume that the cache knows about, then it can do this: if (volume-version stored with resource matches current volume version stored in per-volume entry) then if (volume-expiration time has not yet been reached) then it's OK to return the cached response else if (resource expiration time not yet reached) then it's OK to return the cached response else must do conditional GET from server else if (resource expiration time not yet reached) then it's OK to return the cached response else must do conditional GET from server In other words, the per-volume information is used to extend the expiration time for what might be a large set of cached resources. Of course, this would only pay off when a proxy caches a sufficiently large number of resources from the same server (and from the same volume). But it lets the server assign relatively short expiration times to the individual resources (which makes revocation less important), and still prevent a busy proxy from bombarding it with conditional requests for resources that haven't been modified. It's entirely optional for the server or the cache to implement this, and the implementation at the cache side of things seems to be pretty simple. On the server side, implementation complexity will depend on how the server detects whether a member of a volume is modified. A small amount of support from the underlying file system or database might be quite useful, but in many cases I would guess that a fairly simple scheme would work. For example, if all of the items in a catalog were assigned to the same volume, the server administrator could simply change the volume-version value whenever the catalog was updated. -Jeff
Received on Tuesday, 2 January 1996 23:43:36 UTC