Issues-list item "CACHING-CGI"

he following got dropped on the floor, but for good cleanlyness
reasons, I'm sending it to the list in case someone has some problem with
declaring it closed, as it was marked otherwise in the issues list.

                                - Jim Gettys

Forwarded message 1

  • From: Jeffrey Mogul <mogul@pa.dec.com>
  • Date: Mon, 28 Apr 97 18:37:42 MDT
  • Subject: Issues-list item "CACHING-CGI"
  • To: jg@w3.org
  • Cc: mogul@pa.dec.com
  • Cc: masinter@parc.xerox.com
  • Message-Id: <9704290137.AA03043@acetes.pa.dec.com>
I sent the appended message to the list 12 days ago.  While a few other
messages with the same Subject: line appeared on the list after
I posted that message, none were in direct response to it.

Therefore, I suggest that you treat this issue as CLOSED,
with the resolution being the two-paragraph update to section
13.9 proposed in this message.

-Jeff
------- Forwarded Message

Return-Path: http-wg-request@cuckoo.hpl.hp.com
Received: by acetes.pa.dec.com; id AA18655; Wed, 16 Apr 97 14:47:13 -0700
Received: by pobox1.pa.dec.com; id AA08115; Wed, 16 Apr 97 14:47:12 -0700
Received: from palrel1.hp.com by mail2.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV)id AA19122; Wed, 16 Apr 1997 14:42:56 -0700
Received: from cuckoo.hpl.hp.com (cuckoo.hpl.hp.com [15.144.62.116]) by palrel1.hp.com with ESMTP (8.7.5/8.7.3) id OAA02734; Wed, 16 Apr 1997 14:41:50 -0700 (PDT)
Received: (from procmail@localhost) by cuckoo.hpl.hp.com (8.7.1/8.7.1) id RAA17737; Wed, 16 Apr 1997 17:41:29 -0400 (EDT)
Resent-Date: Wed, 16 Apr 1997 17:41:29 -0400 (EDT)
Message-Id: <9704162132.AA18610@acetes.pa.dec.com>
To: http-wg@cuckoo.hpl.hp.com
From: Jeffrey Mogul <mogul@nospam.org>
Subject: Re: Issues-list item "CACHING-CGI" 
In-Reply-To: Your message of "Tue, 15 Apr 97 17:12:15 MDT."             <9704160012.AA15744@acetes.pa.dec.com> 
Date: Wed, 16 Apr 97 14:32:40 MDT
Sender: mogul@pa.dec.com
Resent-Message-Id: <"nRG-O3.0.1L4.4UKLp"@cuckoo>
Resent-From: http-wg@cuckoo.hpl.hp.com
X-Mailing-List: <http-wg@cuckoo.hpl.hp.com> archive/latest/3081
X-Loop: http-wg@cuckoo.hpl.hp.com
Precedence: list
Resent-Sender: http-wg-request@cuckoo.hpl.hp.com

Wow, things are confused.  Part of this is my fault, for writing
something that wasn't as precise as it should have been.

Most of the confusion is on the part of people who do not distinguish
between "what a CGI script should send" and "what a cache should do
when a CGI script doesn't send what it should have sent".

OF COURSE, CGI-generated responses (and all other origin server
responses, too) should include either an explicit "don't cache
this" marking (e.g., "Cache-control: max-age=0") or an explicit
expiration time.  Of course, of course, of course.

That wasn't the question.  The question (which I stated too
informally) was "what happens when the response is NOT clearly
marked as to cachability?"  This is a *different* question.

Current practice seems to be split.  Ari says that the CERN
and Netscape proxies never cache a response without a Last-Modified
header. (With HTTP/1.1, this rule would presumably change to
"without either a Last-Modifed or Etag header.")  However,
the practice in the Squid world seems to be different.  I'm
not sure I fully understand the Squid code, but the version
I looked at seems to allow caching of a response without
a Last-Modified header.

I was tasked by the working group meeting last week to address the
specific issue of CGI, not the larger issue of whether a response
without Last-Modified should be cached.  based on my belief that the
HTTP/1.1 spec should not discourage caching unnecessarily (reflecting
what Roy wrote earlier, that the Web "depends on accurate caching to
reduce network costs"), I constructed my proposed Note to reflect the
looser approach used by Squid.

So let's take these issues separately.  If someone wants to
propose a specification change (or a new Note for the spec)
that says "do not cache responses without a Last-Modified header",
that's fine with me, although it would be a good idea to
combine this proposal with evidence (from a real-life proxy)
that this doesn't significantly reduce caching in today's Internet.

Back to the CACHING-CGI issue.  My original proposal was sloppy
in that it didn't make a distinction between "cache and reuse
without revalidation" or "cache but must revalidate".  And I
forgot to include "htbin" as being more or less equivalent
to "cgi-bin" (and yes, there are still lots of htbin URLs in
active use).  Also, I violated the informal rule that Notes should
not use terms like "SHOULD".

Here's a revised version, to replace the second paragraph
in section 13.9:

	Some HTTP/1.0 cache operators have found that it is dangerous
	to cache and reuse without revalidation responses to requests
	for URLs that include any of the strings "cgi-bin", "htbin", or
	"?", because applications have traditionally used these URLs in
	conjunction with operations with significant side effects for
	GET or HEAD methods.  However, if such a response includes an
	explicit, future, expiration time, then this implies that the
	response may be cached and reused without revalidation until it
	expires.  If such a response includes a Last-Modified or Etag
	header, this implies that the response may be reused after
	revalidation (or without revalidation if explicitly fresh).

	A cache MUST NOT assign a heuristic expiration time to a
	response for a URL that includes the strings "htbin", "cgi-bin", or
	"?" in its rel_path part.  If such a response does not 
	carry an explicit expiration time, it must be treated as
	if it expires immediately.
	
This does two things: (1) it clarifies that a cache can indeed
follow its usual caching with "?" and "cgi-bin" responses, if
they are explicitly marked to allow caching, and (2) we tighten
up the rules on assigning heuristic expiration times for such
responses, because of the known risks of this specific situation.

- -Jeff


------- End of Forwarded Message

Received on Tuesday, 29 July 1997 13:22:57 UTC