When to make objects uncacheable ?

(Not so much an HTTP 1.1 question, as a general HTTP one)

When should one make objects uncacheable?

Some cases are clearcut - an object that doesn't change should be
cacheable, and if one knows the expiry date one can give it using the
Expires header. An object that changes rapidly such as a snapshot of
CPU usage should be made uncacheable by giving a current (or illegal)
Expires value.

I would think that objects which change slowly over time should be
given Expires values commensurate with the rate of change, for example
a Webcam watching clouds go by might be given a lifetime of 10 minutes.
This would allow proxy caches to usefully save a reasonably up-to-date image
for popular views.

If a document is produced by database lookup instead of from a filesystem,
it seems to me that one should generate Last-Modified times and make the 
object cacheable when there is a database hit. A fish database with 1,000
entries might produce cacheable output when queried such as 
"/cgi-bin/query?salmon" or "/cgi-bin/query?trout", but not
for "/cgi-bin/query?anteater". This would apply to any kind of
system creating HTML on-the-fly from (invariant) source.

Supposing the above premises are reasonable, my question is whether
the output of more general search engines should be made cacheable?
It seems to me that certain queries are fairly popular, and that
some benefit might be had from cacheing the responses. On the other
hand, the number of possible URLs expands exponentially with the
length of the query string, so that cacheing every response would be
unreasonable, filling up caches with never-to-be-repeated requests.

I envisage an algorithm to generate a document lifetime based on 
number of hits and search terms. If my database is updated once a day,
and I have 5 hits for a single search term, it seems reasonable to
assign an Expires date of tomorrow. Alternatively I can generate a
(correct) Last-Modified header for that search term and allow the cache
server to compute an expiry date using its own algorithm. If I have 400
hits from 7 search terms, or no hits, I would give an expiry date of 0.
Another possibility is to create a database of actual queries, and
use that to generate expiry times, which would allow cacheing responses
to often-asked requests for non-existent data.

Comments?

Currently, Apache 1.1.1 will cache anything with a Last-Modified
header, while Squid 1.0.x will not (as shipped) cache anything with
a query term.


Andrew Daviel

andrew@vancouver-webpages.com 
http://vancouver-webpages.com  : home of searchBC

Received on Friday, 16 August 1996 11:01:13 UTC