- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Mon, 22 Jan 96 15:27:59 PST
- To: http-caching@pa.dec.com
Those of you who read my first draft proposal for modifications to the HTTP caching design may remember that I proposed using a new "Fresh-until:" header to convey cache expiration times. I proposed this for several reasons: (1) We came up with the terms "fresh" vs. "stale" to describe the states that cache entries could be in, and I was trying to keep these concepts in mind. (2) Expires: has potential problems with clock skew. (3) The description of Expires: in draft-ietf-http-v11-spec-01.txt says "Applications must not cache this entity beyond the date given," whereas we seem to have some agreement that both "fresh" and "stale" entities can be stored in a cache (but a "stale" entity must be validated with the origin server before being used as a response). (4) I think there is a distinction to be made between cachability-expiration (i.e, "fresh-until time") and document expiration. The latter might be something like the 6-month expiration date on an Internet draft, and/or might be used to help Web robots decide when to revisit a page. And it might be reasonable to present this information to the actual user, whereas the Expires: header is probably not that interesting to the user (unless something goes wrong with caching). (5) My survey of Expires: headers found by Altavista's crawler this past fall showed that a lot had serious date-syntax errors. (6) While there is an obvious Expires: encoding for "already expired" (i.e., never treat a cached copy as if fresh), there are no obvious encodings for "never expires" or "expiration date undefined". The lack of an Expires: header is usually taken to mean the latter (see the CERN proxy description). One could encode the former with a date far in the future, but I suspect that some implementations will run into trouble with this kind of encoding in just under 4 years from this month. In proposing "Fresh-until:", I was hoping to solve all of these issues in one swoop. It also provided a way to detect pre-1.1 origin servers without having to use version numbers, which made it easier for me to think about separating the cases (of 1.1 servers and earlier ones). But there are numerous good reasons to try to stick with the existing syntax (if it can be done without introducing new problems), and in particular it would mean that the existing (pre-1.1 proxies) might be able to do a reasonably good job of managing their caches. So I now think that all six of the (potential) problems that I listed above can be solved while still using the Expires: header. Here is a summary of how I think it can be done, issue by issue: (1) conceptual distinction This can be handled by carefully wording things in the spec. I.e., it doesn't really matter what header names we use, as long as we define the concepts clearly and clearly define how the protocol elements related to those concepts. (2) clock skew Koen Holtman has proposed a simple and (with a few minor modifications) compatible mechanism to avoid clock skew problems in most cases. This involves caches tracking how long they have held onto an unrefreshed response, and then transmitting this information in an "Age:" header. One then computes the difference between the origin server's Date: and Expires: headers, then sums all of the Ages and subtracts that sum from the expiration difference to decide if the fresh-until time has passed. (3) specification wording regarding "must not cache" I believe that we can safely reword section 10.19 of Roy's 1.1 draft from: Applications must not cache this entity beyond the date given to: Applications must not return this entity from their cache after the date given, without first validating it with the origin server. This changes the spec to make it clear when a conditional GET (or perhaps other methods) is supposed to be used. And I believe that although this is not strictly the interpretation used by 1.0 servers, it is not actually unsafe. Or does anyone know of an server implementation that is willing to return "304 Not Modified" in response to a GET If-modified-since on a resource that should not be used at all after its Expires: date? (4) distinction between cache freshness and document expiration Since the 1.0 and 1.1 specs never really introduced this distinction, or provided a way to express "document expiration", I think the best solution would be to add a new header for this purpose. (5) syntax errors Roy's 1.1 spec already includes a way for caches to deal with syntactically incorrect Expires: headers ... treat them as "Expires: <already>". This doesn't solve the problem of 1.0 servers that may be sending syntactically correct but semantically bogus Expires: values, but I don't think there is a way to solve the problem of stupid implementors or administrators. Since my bias is towards safety rather than performance, I'm happy with a design that turns off caching if there is any doubt about the Expires: date syntax. (6) encodings for "never" and "undefined" Several people have argued convincingly that "no Expires: header" should mean "undefined" (that is, the cache can use its own heuristics), in keeping with current (although sparsely documented) practice. As for "never", I think the problem is "what is the largest HTTP date value that we must insist that any cache can properly parse?" The spec allows RFC 850 dates, which break after 1999. I think the HTTP/1.1 spec should take a stronger stand on this, and insist that servers never generate RFC-850-style dates (but clients must continue to accept them). Even so, I suspect that there may be some implementations out there that have trouble with dates on or after 01 Jan 2000, and there are certainly a lot of implementations that use UNIX-style dates (32-bit seconds since 01 Jan 1970) which cannot handle dates after sometime in the year 2038. This means that it's probably a bad idea for the spec to say something like "Expires: 12-31-9999" means "never expires"; some implementations (especially existing ones) are bound to represent this wrong. My "Fresh-until:" header proposal solved this by using a special keyword for "never expires", but while this is perhaps possible for HTTP/1.1, it would cause HTTP/1.0 clients and proxies to treat "Expires: never" as "syntax error, assume Expires: already", which is exactly what we don't want. Which is to say that while it may be safe, it would really hurt cache performance for such documents. After mulling this over for a while, the best solution that I can come up with is for the spec to say something like this: HTTP/1.1 servers SHOULD represent "never expires" as an Expires: date approximately [one] year from the time the response is generated. HTTP/1.1 servers should not send Expires: dates more than [one] year in the future. All Expires: dates SHOULD be in RFC-822 form, and MUST NOT be in RFC-850 form. HTTP/1.1 clients (and caches) SHOULD be able to parse HTTP-date values at least one year in the future (from the time that a response is received). HTTP/1.1 clients and caches should assume that an RFC-850 date which appears to be more than [50] years in the future is in fact in the past (this helps solve the "year 2000" problem). All HTTP/1.1 implementations MUST be capable of correctly parsing any HTTP-date value that is no less than [25] years in the future from the date that the implementation is sold or otherwise distributed. An HTTP/1.1 implementation may internally represent a parsed Expires: date as earlier than the proper value, but MUST NOT internally represent a parsed Expires: date as later than the proper value. The values in brackets are somewhat arbitrary, and could be adjusted. I think it's reasonable to represent "expires never" as "expires one year from now", since this should not materially affect the server loads. Summary of my current thinking: (a) Retain Expires: with some minor changes in wording and language to deal with the year-2000 problem. (b) Add "Age:" headers to help with clock skew (and to provide a reasonable interpretation for "Cache-control: max-age"). (c) Possibly add something like "Resource-expiration:" to allow distinct representation of that concept. -Jeff
Received on Monday, 22 January 1996 23:41:28 UTC