- From: Koen Holtman <koen@win.tue.nl>
- Date: Fri, 29 Dec 1995 16:17:46 +0100 (MET)
- To: luotonen@netscape.com (Ari Luotonen)
- Cc: http-caching@pa.dec.com
Ari Luotonen: > >A feature-request I have bumped into several times just recently, and >towards which I'm tempted to incline, is kind of what AFS does: > > Have the server (as a server option) choose to tell > the proxy that it is ok to return directly from the > cache without a check for so-and-so long time. If > during that time the object changes, the *server* will > notify the *proxy* about this. Having explicit revocation is a good thing, but I agree with others that it is too complex/unexplored to get it into HTTP 1.1 (which is supposed to be a 'fast track' standard). [...] >The theory behind this is that _most_ of the time _most_ objects do >_not_ change soon after they get retrieved -- that's why today's >proxies perform so well already (Netscape's proxy saves up to 60% in >connections and 75% in bandwidth) when properly configured and with >the critical mass of users using it), even though they rely heavily on >heuristics, and there's minimal support for them in the protocol. 60% savings in connections and 75% savings in bandwidth???? I am a bit disturbed by that figure, it is highly atypical. For 'local client -> non-local server' requests, proxies that I know of do not perform that well already, and I doubt they ever will. (Unless you are talking about proxies with gigabytes of diskspace that serve at least a small country). The paper `Caching Proxies: Limitations and Potentials' at http://www.w3.org/pub/Conferences/WWW4/Papers/155/ concludes: |1. We confirmed previous data in the literature that a caching proxy | has an upper bound of 30-50% in its hit rate, given an infinite | size cache and an eight day cache purge interval. I have measured a fairly constant 30-40% hit rate and 30-40% bandwidth saving at our proxy cache. Are you sure that your figures are for a cache that caches _outgoing_ http requests, i.e. request made by your local users to WWW servers not on your local network? I have seen higher savings figures reported - for caches that also cache traffic from local browsers to local servers - for caches that cache requests from outside users to local servers (see for example http://www.vuw.ac.nz/~mimi/www/www-caching/caching.html ) but these figures do not matter much if you want to talk about the impact of caches on the size of (non-local) internet traffic. I'm sorry to be so negative, but I have serious doubts about caching schemes in proxies being able to reduce internet traffic with more than 50%. Some people have argued that the exponential growth of web traffic makes is _necessary_ for caching proxies to reach hit rates of at least 95%, but I see no way in which caching technology to provide such an exponential improvement. Service authors are continually putting new content on the web. If this continual addition of new content did not exist, gigabyte-sized caches might get to 95% hit rates, but with new content always being added (and accessed), we can never reach 95%. [Note that I did not say that proxies cannot reduce web traffic with more than 50%: using a combination of caching and compression, 75% could be reached.] >Or in other words, the fact is that _most_ of the If-modified-since >checks performed by proxies in fact yield 304. We're talking about >over 90%; if configured to perform up-to-date checks for every >request, that figure comes pretty darn close to 99.9%. I see no obligation in the protocol to perform up-to-date checks for every request, so a configuration that gets 99.9% is completelty unnecessary. Conditional GETs are only required for resources that have expired. In fact, I would consider doing up-to-date checks for every single request, if not forced to do so by an Expires header, to be extremely rude and wasteful of origin server resources. On a related note, I recently discovered that the Netscape client cache, if configured to `verify document: every time', will indeed do a conditional GET for every new request on a resource that lacks an Expires header. Eek. I thought that `verify document' applied to conditional GETs on expired documents only, so I had enabled this option on my Netscape copy. I am a bit disturbed by Netscape having this cache configuration option at all. If only 10% of Netscape users enable it, this will they will cause an enormous increase in the number of conditional GETs going over the net. >So hey -- up-to-date checks are wasteful, too, and in practice all the >service providers and most companies that run a proxy configure it so >that it does _not_ perform checks during a few hours after the last >check. I hope you are talking about not always performing conditional GETs on resources that are not expired here. It would be _very bad_ for a cache not to relay conditional GETs on documents that are expired. Perhaps we should put a note in the protocol about what preferred cache behavior is if an Expires header is _absent_. I conclude that we should first focus on reducing the number of _unnecessary_ conditional GETs by giving some guidelines in the 1.1 protocol. Then, we can talk about replacing some of the necessary conditional GETs with an explicit revocation scheme in 1.2. >Ari Luotonen ari@netscape.com Koen.
Received on Friday, 29 December 1995 15:37:58 UTC