- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Mon, 19 Aug 96 11:53:14 MDT
- To: Andrew Daviel <andrew@vancouver-webpages.com>
- Cc: HTTP Working Group <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>, ircache list <ircache@nlanr.net>
When should one make objects uncacheable? Are you asking this question from the point of view of the origin server, or of a proxy? If your question is "when should a proxy decide, on its own initiative, that something is cachable?", then the only safe answer is "never." But you seem to be asking mostly about what an origin server should do, e.g.: I would think that objects which change slowly over time should be given Expires values commensurate with the rate of change, for example a Webcam watching clouds go by might be given a lifetime of 10 minutes. This would allow proxy caches to usefully save a reasonably up-to-date image for popular views. That's basically reasonable, but perhaps not quite definitive. One way to look at this is that the origin server should set the lifetime so that the expected value of getting an incorrect response from a cache (that is, the probability multiplied by some sort of "cost" of a wrong answer) is lower than the expected "cost" of extra caches misses. These costs aren't necessarily in the same units (milliseconds vs. lawsuits) and so it's not always easy to judge. But "commensurate with the [expected] rate of change" works nicely for some things (such as most webcam pictures) and not at all for others (such as, for example, a security-camera picture). A fish database with 1,000 entries might produce cacheable output when queried such as "/cgi-bin/query?salmon" or "/cgi-bin/query?trout", but not for "/cgi-bin/query?anteater". This would apply to any kind of system creating HTML on-the-fly from (invariant) source. I'm not sure why the results would be any less cachable for the anteater query, if the HTTP response carries a "200 (OK)" status. The critical question here is "would I give a different response later on?", and I wouldn't expect it to change. Supposing the above premises are reasonable, my question is whether the output of more general search engines should be made cacheable? It seems to me that certain queries are fairly popular, and that some benefit might be had from cacheing the responses. On the other hand, the number of possible URLs expands exponentially with the length of the query string, so that cacheing every response would be unreasonable, filling up caches with never-to-be-repeated requests. I did a simple study a few months ago with about 1 day's worth of Altavista queries. (This was back when we were doing something like 4 million hits per day; we're now doing about 4 times as many, and so the statistics may have changed.) I found that even if you could cache the result of all of the queries for an entire 24-hour period, the best-case cache hit rate would be around 15%. Not really worth the effort, I think. Note that the operators of AltaVista know how often they update the database. They could easily send a max-age value in their responses that would allow caching without harming transparency (provided that proxies didn't view this as a license to choose their own expiration times!). I don't think it makes sense to decide whether something is cachable based on the number of hits; resources are cachable or not based on their essential nature, not how popular they are. It's up to the cache (proxy or otherwise) to decide whether it wants to store a cachable response in its finite memory, and when to remove it. But this is a separate decision. Currently, Apache 1.1.1 will cache anything with a Last-Modified header, while Squid 1.0.x will not (as shipped) cache anything with a query term. My intuition is that Apache's policy will lead to misbehavior in some cases, whether or not there is a query term. -Jeff
Received on Monday, 19 August 1996 12:04:36 UTC