- From: Jeffrey Mogul <mogul@nospam.org>
- Date: Wed, 16 Apr 97 14:32:40 MDT
- To: http-wg@cuckoo.hpl.hp.com
Wow, things are confused. Part of this is my fault, for writing something that wasn't as precise as it should have been. Most of the confusion is on the part of people who do not distinguish between "what a CGI script should send" and "what a cache should do when a CGI script doesn't send what it should have sent". OF COURSE, CGI-generated responses (and all other origin server responses, too) should include either an explicit "don't cache this" marking (e.g., "Cache-control: max-age=0") or an explicit expiration time. Of course, of course, of course. That wasn't the question. The question (which I stated too informally) was "what happens when the response is NOT clearly marked as to cachability?" This is a *different* question. Current practice seems to be split. Ari says that the CERN and Netscape proxies never cache a response without a Last-Modified header. (With HTTP/1.1, this rule would presumably change to "without either a Last-Modifed or Etag header.") However, the practice in the Squid world seems to be different. I'm not sure I fully understand the Squid code, but the version I looked at seems to allow caching of a response without a Last-Modified header. I was tasked by the working group meeting last week to address the specific issue of CGI, not the larger issue of whether a response without Last-Modified should be cached. based on my belief that the HTTP/1.1 spec should not discourage caching unnecessarily (reflecting what Roy wrote earlier, that the Web "depends on accurate caching to reduce network costs"), I constructed my proposed Note to reflect the looser approach used by Squid. So let's take these issues separately. If someone wants to propose a specification change (or a new Note for the spec) that says "do not cache responses without a Last-Modified header", that's fine with me, although it would be a good idea to combine this proposal with evidence (from a real-life proxy) that this doesn't significantly reduce caching in today's Internet. Back to the CACHING-CGI issue. My original proposal was sloppy in that it didn't make a distinction between "cache and reuse without revalidation" or "cache but must revalidate". And I forgot to include "htbin" as being more or less equivalent to "cgi-bin" (and yes, there are still lots of htbin URLs in active use). Also, I violated the informal rule that Notes should not use terms like "SHOULD". Here's a revised version, to replace the second paragraph in section 13.9: Some HTTP/1.0 cache operators have found that it is dangerous to cache and reuse without revalidation responses to requests for URLs that include any of the strings "cgi-bin", "htbin", or "?", because applications have traditionally used these URLs in conjunction with operations with significant side effects for GET or HEAD methods. However, if such a response includes an explicit, future, expiration time, then this implies that the response may be cached and reused without revalidation until it expires. If such a response includes a Last-Modified or Etag header, this implies that the response may be reused after revalidation (or without revalidation if explicitly fresh). A cache MUST NOT assign a heuristic expiration time to a response for a URL that includes the strings "htbin", "cgi-bin", or "?" in its rel_path part. If such a response does not carry an explicit expiration time, it must be treated as if it expires immediately. This does two things: (1) it clarifies that a cache can indeed follow its usual caching with "?" and "cgi-bin" responses, if they are explicitly marked to allow caching, and (2) we tighten up the rules on assigning heuristic expiration times for such responses, because of the known risks of this specific situation. -Jeff
Received on Wednesday, 16 April 1997 14:43:59 UTC