Re: Explicit revocation

Ari Luotonen:
>
>A feature-request I have bumped into several times just recently, and
>towards which I'm tempted to incline, is kind of what AFS does:
>
>        Have the server (as a server option) choose to tell
>        the proxy that it is ok to return directly from the
>        cache without a check for so-and-so long time.  If
>        during that time the object changes, the *server* will
>        notify the *proxy* about this.

Having explicit revocation is a good thing, but I agree with others
that it is too complex/unexplored to get it into HTTP 1.1 (which is
supposed to be a 'fast track' standard).

[...]
>The theory behind this is that _most_ of the time _most_ objects do
>_not_ change soon after they get retrieved -- that's why today's
>proxies perform so well already (Netscape's proxy saves up to 60% in
>connections and 75% in bandwidth) when properly configured and with
>the critical mass of users using it), even though they rely heavily on
>heuristics, and there's minimal support for them in the protocol.

60% savings in connections and 75% savings in bandwidth????  I am a
bit disturbed by that figure, it is highly atypical.

For 'local client -> non-local server' requests, proxies that I know
of do not perform that well already, and I doubt they ever
will. (Unless you are talking about proxies with gigabytes of
diskspace that serve at least a small country).

The paper `Caching Proxies: Limitations and Potentials' at
http://www.w3.org/pub/Conferences/WWW4/Papers/155/ concludes:

|1.  We confirmed previous data in the literature that a caching proxy
|    has an upper bound of 30-50% in its hit rate, given an infinite
|    size cache and an eight day cache purge interval.

I have measured a fairly constant 30-40% hit rate and 30-40% bandwidth
saving at our proxy cache.

Are you sure that your figures are for a cache that caches _outgoing_
http requests, i.e. request made by your local users to WWW servers
not on your local network?

I have seen higher savings figures reported
 - for caches that also cache traffic from local browsers to local
   servers
 - for caches that cache requests from outside users to local servers
   (see for example
   http://www.vuw.ac.nz/~mimi/www/www-caching/caching.html )
but these figures do not matter much if you want to talk about the
impact of caches on the size of (non-local) internet traffic.

I'm sorry to be so negative, but I have serious doubts about caching
schemes in proxies being able to reduce internet traffic with more
than 50%.  Some people have argued that the exponential growth of web
traffic makes is _necessary_ for caching proxies to reach hit rates of
at least 95%, but I see no way in which caching technology to provide
such an exponential improvement.

Service authors are continually putting new content on the web.  If
this continual addition of new content did not exist, gigabyte-sized
caches might get to 95% hit rates, but with new content always being
added (and accessed), we can never reach 95%.

[Note that I did not say that proxies cannot reduce web traffic with
more than 50%: using a combination of caching and compression, 75%
could be reached.]

>Or in other words, the fact is that _most_ of the If-modified-since
>checks performed by proxies in fact yield 304.  We're talking about
>over 90%; if configured to perform up-to-date checks for every
>request, that figure comes pretty darn close to 99.9%.

I see no obligation in the protocol to perform up-to-date checks for
every request, so a configuration that gets 99.9% is completelty
unnecessary.  Conditional GETs are only required for resources that
have expired.  In fact, I would consider doing up-to-date checks for
every single request, if not forced to do so by an Expires header, to
be extremely rude and wasteful of origin server resources.

On a related note, I recently discovered that the Netscape client
cache, if configured to `verify document: every time', will indeed do
a conditional GET for every new request on a resource that lacks an
Expires header.  Eek.  I thought that `verify document' applied to
conditional GETs on expired documents only, so I had enabled this
option on my Netscape copy.

I am a bit disturbed by Netscape having this cache configuration
option at all.  If only 10% of Netscape users enable it, this will
they will cause an enormous increase in the number of conditional GETs
going over the net.

>So hey -- up-to-date checks are wasteful, too, and in practice all the
>service providers and most companies that run a proxy configure it so
>that it does _not_ perform checks during a few hours after the last
>check.

I hope you are talking about not always performing conditional GETs on
resources that are not expired here.  It would be _very bad_ for a
cache not to relay conditional GETs on documents that are expired.

Perhaps we should put a note in the protocol about what preferred
cache behavior is if an Expires header is _absent_.

I conclude that we should first focus on reducing the number of
_unnecessary_ conditional GETs by giving some guidelines in the 1.1
protocol.  Then, we can talk about replacing some of the necessary
conditional GETs with an explicit revocation scheme in 1.2.

>Ari Luotonen                            ari@netscape.com

Koen.

Received on Friday, 29 December 1995 15:37:58 UTC