Re: caching of cgi-generated pages from Patrick McManus on 1999-07-28 (www-talk@w3.org from July to August 1999)

From: Patrick McManus <mcmanus@appliedtheory.com>
Date: Wed, 28 Jul 1999 15:12:45 -0400 (EDT)
To: dmitry@ostankino.ucsd.edu (Dmitry Beransky)
Cc: www-talk@w3.org
Message-Id: <199907281912.PAA24438@justice.atc-bos.com>

In a previous episode Dmitry Beransky said...
:: 
:: Which specification states (recommends) that browsers should not cache 
:: cgi-generated pages?

no spec says that.. and no spec should. indeed browsers have no
deterministic way of knowing by what process a page was created.

each response includes a header that specifies expiration information
for the included entity...

proxy caches have a habit of inventing that information in the case of
pages that don't come labeled from the server.. This practice is
generally discouraged by HTTP, but in the case of documents that
aren't well labeled it's kind of a necessary evil. because this is
non-deterministic they use a bunch of rules to come up with the
lifetime of the page.. among these is that resources that have
cgi-bin, a ? in the URL, or end in .cgi are proabably uncachable.. but
that's just a guess on their part... and a server that accurately
labels the response should certainly not be subject to those kinds of
games.

For instance, consider the URL

http://www.pi-calculator.com/digit.cgi?digitnumber=173

that gives you the 173rd digit of pi.. That might very well be
dynamically generated content as there isn't enough disk space to
store all the possible permutations, but the answer to digitnumber=173
is never going to expire, so it's very cachable and a smart server
will label it as such..

also consider something like

http://www.big-news-organization.com/latestnews.cgi

probably wants to use the cache-control: no-cache or must-revalidate
header in its response.. That does NOT make the response totally
uncacheable (uncacheable would be a response that you can't keep in
your cache and ever reuse) it simply means that the client is required
to check with www.big-news-organization.com before reusing the cached
page.. and indeed, the latest news doesn't change every instant so you
may be able to reuse the response, incurring only the cost of querying
the server instead of the cost of transfering the page... and yet the
user will always see the latest info available. Another way to think
about this is that the response is instantly 'stale' instead of being
uncachable..

Truly uncacheable things are identified by either no-store or private
which are designed to keep things out of persistent storage as a
privacy matter. (so that they don't end up on backup tapes or stuff
like that).. the former doesn't let the message be written to any
cache, and the latter doesn't let it get written to any shared cache
but a private (browser) cache would be ok.

-P

Received on Wednesday, 28 July 1999 15:12:57 UTC