Re: Issues-list item "CACHING-CGI"

Wow, things are confused.  Part of this is my fault, for writing
something that wasn't as precise as it should have been.

Most of the confusion is on the part of people who do not distinguish
between "what a CGI script should send" and "what a cache should do
when a CGI script doesn't send what it should have sent".

OF COURSE, CGI-generated responses (and all other origin server
responses, too) should include either an explicit "don't cache
this" marking (e.g., "Cache-control: max-age=0") or an explicit
expiration time.  Of course, of course, of course.

That wasn't the question.  The question (which I stated too
informally) was "what happens when the response is NOT clearly
marked as to cachability?"  This is a *different* question.

Current practice seems to be split.  Ari says that the CERN
and Netscape proxies never cache a response without a Last-Modified
header. (With HTTP/1.1, this rule would presumably change to
"without either a Last-Modifed or Etag header.")  However,
the practice in the Squid world seems to be different.  I'm
not sure I fully understand the Squid code, but the version
I looked at seems to allow caching of a response without
a Last-Modified header.

I was tasked by the working group meeting last week to address the
specific issue of CGI, not the larger issue of whether a response
without Last-Modified should be cached.  based on my belief that the
HTTP/1.1 spec should not discourage caching unnecessarily (reflecting
what Roy wrote earlier, that the Web "depends on accurate caching to
reduce network costs"), I constructed my proposed Note to reflect the
looser approach used by Squid.

So let's take these issues separately.  If someone wants to
propose a specification change (or a new Note for the spec)
that says "do not cache responses without a Last-Modified header",
that's fine with me, although it would be a good idea to
combine this proposal with evidence (from a real-life proxy)
that this doesn't significantly reduce caching in today's Internet.

Back to the CACHING-CGI issue.  My original proposal was sloppy
in that it didn't make a distinction between "cache and reuse
without revalidation" or "cache but must revalidate".  And I
forgot to include "htbin" as being more or less equivalent
to "cgi-bin" (and yes, there are still lots of htbin URLs in
active use).  Also, I violated the informal rule that Notes should
not use terms like "SHOULD".

Here's a revised version, to replace the second paragraph
in section 13.9:

	Some HTTP/1.0 cache operators have found that it is dangerous
	to cache and reuse without revalidation responses to requests
	for URLs that include any of the strings "cgi-bin", "htbin", or
	"?", because applications have traditionally used these URLs in
	conjunction with operations with significant side effects for
	GET or HEAD methods.  However, if such a response includes an
	explicit, future, expiration time, then this implies that the
	response may be cached and reused without revalidation until it
	expires.  If such a response includes a Last-Modified or Etag
	header, this implies that the response may be reused after
	revalidation (or without revalidation if explicitly fresh).

	A cache MUST NOT assign a heuristic expiration time to a
	response for a URL that includes the strings "htbin", "cgi-bin", or
	"?" in its rel_path part.  If such a response does not 
	carry an explicit expiration time, it must be treated as
	if it expires immediately.
	
This does two things: (1) it clarifies that a cache can indeed
follow its usual caching with "?" and "cgi-bin" responses, if
they are explicitly marked to allow caching, and (2) we tighten
up the rules on assigning heuristic expiration times for such
responses, because of the known risks of this specific situation.

-Jeff

Received on Wednesday, 16 April 1997 14:43:59 UTC