Re: When to make objects uncacheable ? from Jeffrey Mogul on 1996-08-19 (ietf-http-wg@w3.org from July to September 1996)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Mon, 19 Aug 96 11:53:14 MDT
To: Andrew Daviel <andrew@vancouver-webpages.com>
Cc: HTTP Working Group <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>, ircache list <ircache@nlanr.net>
Message-Id: <9608191853.AA11237@acetes.pa.dec.com>

    When should one make objects uncacheable?

Are you asking this question from the point of view of the origin
server, or of a proxy?

If your question is "when should a proxy decide, on its own
initiative, that something is cachable?", then the only safe
answer is "never."

But you seem to be asking mostly about what an origin server
should do, e.g.:
    
    I would think that objects which change slowly over time should be
    given Expires values commensurate with the rate of change, for
    example a Webcam watching clouds go by might be given a lifetime of
    10 minutes.  This would allow proxy caches to usefully save a
    reasonably up-to-date image for popular views.

That's basically reasonable, but perhaps not quite definitive.
One way to look at this is that the origin server should set the
lifetime so that the expected value of getting an incorrect response
from a cache (that is, the probability multiplied by some sort of
"cost" of a wrong answer) is lower than the expected "cost" of
extra caches misses.

These costs aren't necessarily in the same units (milliseconds vs.
lawsuits) and so it's not always easy to judge.  But "commensurate with
the [expected] rate of change" works nicely for some things (such as
most webcam pictures) and not at all for others (such as, for example,
a security-camera picture).
    
    A fish database with 1,000
    entries might produce cacheable output when queried such as 
    "/cgi-bin/query?salmon" or "/cgi-bin/query?trout", but not
    for "/cgi-bin/query?anteater". This would apply to any kind of
    system creating HTML on-the-fly from (invariant) source.

I'm not sure why the results would be any less cachable for the
anteater query, if the HTTP response carries a "200 (OK)" status.
The critical question here is "would I give a different response
later on?", and I wouldn't expect it to change.
    
    Supposing the above premises are reasonable, my question is whether
    the output of more general search engines should be made cacheable?
    It seems to me that certain queries are fairly popular, and that
    some benefit might be had from cacheing the responses. On the other
    hand, the number of possible URLs expands exponentially with the
    length of the query string, so that cacheing every response would be
    unreasonable, filling up caches with never-to-be-repeated requests.

I did a simple study a few months ago with about 1 day's worth of
Altavista queries.  (This was back when we were doing something
like 4 million hits per day; we're now doing about 4 times as many,
and so the statistics may have changed.)  I found that even if you
could cache the result of all of the queries for an entire 24-hour
period, the best-case cache hit rate would be around 15%.  Not really
worth the effort, I think.

Note that the operators of AltaVista know how often they update the
database.  They could easily send a max-age value in their responses
that would allow caching without harming transparency (provided that
proxies didn't view this as a license to choose their own expiration
times!).
    
I don't think it makes sense to decide whether something is cachable
based on the number of hits; resources are cachable or not based on
their essential nature, not how popular they are.

It's up to the cache (proxy or otherwise) to decide whether it wants
to store a cachable response in its finite memory, and when to remove
it.  But this is a separate decision.
    
    Currently, Apache 1.1.1 will cache anything with a Last-Modified
    header, while Squid 1.0.x will not (as shipped) cache anything with
    a query term.

My intuition is that Apache's policy will lead to misbehavior in
some cases, whether or not there is a query term.

-Jeff

Received on Monday, 19 August 1996 12:04:36 UTC