thoughts about transmitting cache expiration times

Those of you who read my first draft proposal for modifications
to the HTTP caching design may remember that I proposed using a
new "Fresh-until:" header to convey cache expiration times.
I proposed this for several reasons:

	(1) We came up with the terms "fresh" vs. "stale"
	to describe the states that cache entries could be
	in, and I was trying to keep these concepts in
	mind.
	
	(2) Expires: has potential problems with clock skew.
	
	(3) The description of Expires: in draft-ietf-http-v11-spec-01.txt
	says "Applications must not cache this entity beyond the date
	given," whereas we seem to have some agreement that both
	"fresh" and "stale" entities can be stored in a cache (but
	a "stale" entity must be validated with the origin server
	before being used as a response).
	
	(4) I think there is a distinction to be made between
	cachability-expiration (i.e, "fresh-until time") and
	document expiration.  The latter might be something
	like the 6-month expiration date on an Internet draft,
	and/or might be used to help Web robots decide when
	to revisit a page.  And it might be reasonable to
	present this information to the actual user, whereas
	the Expires: header is probably not that interesting
	to the user (unless something goes wrong with caching).

	(5) My survey of Expires: headers found by Altavista's
	crawler this past fall showed that a lot had serious
	date-syntax errors.

	(6) While there is an obvious Expires: encoding for
	"already expired" (i.e., never treat a cached copy
	as if fresh), there are no obvious encodings for
	"never expires" or "expiration date undefined".  The
	lack of an Expires: header is usually taken to mean
	the latter (see the CERN proxy description).  One
	could encode the former with a date far in the future,
	but I suspect that some implementations will run into
	trouble with this kind of encoding in just under 4 years
	from this month.

In proposing "Fresh-until:", I was hoping to solve all of these
issues in one swoop.  It also provided a way to detect pre-1.1
origin servers without having to use version numbers, which made
it easier for me to think about separating the cases (of 1.1
servers and earlier ones).

But there are numerous good reasons to try to stick with the existing
syntax (if it can be done without introducing new problems), and
in particular it would mean that the existing (pre-1.1 proxies)
might be able to do a reasonably good job of managing their caches.

So I now think that all six of the (potential) problems that I
listed above can be solved while still using the Expires: header.
Here is a summary of how I think it can be done, issue by issue:

(1) conceptual distinction
	This can be handled by carefully wording things in the
	spec.  I.e., it doesn't really matter what header names
	we use, as long as we define the concepts clearly and
	clearly define how the protocol elements related to
	those concepts.

(2) clock skew
	Koen Holtman has proposed a simple and (with a few
	minor modifications) compatible mechanism to avoid
	clock skew problems in most cases.  This involves
	caches tracking how long they have held onto an
	unrefreshed response, and then transmitting this
	information in an "Age:" header.  One then computes
	the difference between the origin server's Date:
	and Expires: headers, then sums all of the Ages
	and subtracts that sum from the expiration difference
	to decide if the fresh-until time has passed.

(3) specification wording regarding "must not cache"
	I believe that we can safely reword section 10.19 of
	Roy's 1.1 draft from:
	    Applications must not cache this entity beyond the date given
	to:
	    Applications must not return this entity from their cache
	    after the date given, without first validating it with
	    the origin server.
	This changes the spec to make it clear when a conditional
	GET (or perhaps other methods) is supposed to be used.  And
	I believe that although this is not strictly the interpretation
	used by 1.0 servers, it is not actually unsafe.
	
	Or does anyone know of an server implementation that is
	willing to return "304 Not Modified" in response to a
	GET If-modified-since on a resource that should not be
	used at all after its Expires: date?

(4) distinction between cache freshness and document expiration
	Since the 1.0 and 1.1 specs never really introduced this
	distinction, or provided a way to express "document
	expiration", I think the best solution would be to add
	a new header for this purpose.

(5) syntax errors
	Roy's 1.1 spec already includes a way for caches to deal
	with syntactically incorrect Expires: headers ... treat
	them as "Expires: <already>".  This doesn't solve the
	problem of 1.0 servers that may be sending syntactically
	correct but semantically bogus Expires: values, but I
	don't think there is a way to solve the problem of
	stupid implementors or administrators.  Since my bias
	is towards safety rather than performance, I'm happy
	with a design that turns off caching if there is any
	doubt about the Expires: date syntax.

(6) encodings for "never" and "undefined"
	Several people have argued convincingly that "no Expires:
	header" should mean "undefined" (that is, the cache can
	use its own heuristics), in keeping with current (although
	sparsely documented) practice.
	
	As for "never", I think the problem is "what is the
	largest HTTP date value that we must insist that any
	cache can properly parse?"  The spec allows RFC 850
	dates, which break after 1999.  I think the HTTP/1.1
	spec should take a stronger stand on this, and insist
	that servers never generate RFC-850-style dates (but
	clients must continue to accept them).
	
	Even so, I suspect that there may be some implementations
	out there that have trouble with dates on or after 01 Jan 2000,
	and there are certainly a lot of implementations that use
	UNIX-style dates (32-bit seconds since 01 Jan 1970) which
	cannot handle dates after sometime in the year 2038.
	This means that it's probably a bad idea for the spec to say 
	something like "Expires: 12-31-9999" means "never expires";
	some implementations (especially existing ones) are bound
	to represent this wrong.
	
	My "Fresh-until:" header proposal solved this by using
	a special keyword for "never expires", but while this
	is perhaps possible for HTTP/1.1, it would cause HTTP/1.0
	clients and proxies to treat "Expires: never" as "syntax
	error, assume Expires: already", which is exactly what
	we don't want.  Which is to say that while it may be
	safe, it would really hurt cache performance for such
	documents.
	
	After mulling this over for a while, the best solution
	that I can come up with is for the spec to say something
	like this:

		HTTP/1.1 servers SHOULD represent "never expires"
		as an Expires: date approximately [one] year from
		the time the response is generated.  HTTP/1.1
		servers should not send Expires: dates more than
		[one] year in the future.  All Expires: dates
		SHOULD be in RFC-822 form, and MUST NOT be in
		RFC-850 form.
		
		HTTP/1.1 clients (and caches) SHOULD be able to
		parse HTTP-date values at least one year in
		the future (from the time that a response is
		received).  HTTP/1.1 clients and caches should
		assume that an RFC-850 date which appears to
		be more than [50] years in the future is in fact
		in the past (this helps solve the "year 2000"
		problem).  All HTTP/1.1 implementations MUST
		be capable of correctly parsing any HTTP-date
		value that is no less than [25] years in the
		future from the date that the implementation
		is sold or otherwise distributed.  An HTTP/1.1
		implementation may internally represent a parsed
		Expires: date as earlier than the proper value,
		but MUST NOT internally represent a parsed
		Expires: date as later than the proper value.

	The values in brackets are somewhat arbitrary, and could
	be adjusted.  I think it's reasonable to represent "expires
	never" as "expires one year from now", since this should
	not materially affect the server loads.

Summary of my current thinking:
    (a) Retain Expires: with some minor changes in wording and
    	language to deal with the year-2000 problem.
    (b) Add "Age:" headers to help with clock skew (and to
    	provide a reasonable interpretation for "Cache-control: max-age").
    (c) Possibly add something like "Resource-expiration:" to
    	allow distinct representation of that concept.

-Jeff

Received on Monday, 22 January 1996 23:41:28 UTC