- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Mon, 22 Jan 96 15:27:59 PST
- To: http-caching@pa.dec.com
Those of you who read my first draft proposal for modifications
to the HTTP caching design may remember that I proposed using a
new "Fresh-until:" header to convey cache expiration times.
I proposed this for several reasons:
(1) We came up with the terms "fresh" vs. "stale"
to describe the states that cache entries could be
in, and I was trying to keep these concepts in
mind.
(2) Expires: has potential problems with clock skew.
(3) The description of Expires: in draft-ietf-http-v11-spec-01.txt
says "Applications must not cache this entity beyond the date
given," whereas we seem to have some agreement that both
"fresh" and "stale" entities can be stored in a cache (but
a "stale" entity must be validated with the origin server
before being used as a response).
(4) I think there is a distinction to be made between
cachability-expiration (i.e, "fresh-until time") and
document expiration. The latter might be something
like the 6-month expiration date on an Internet draft,
and/or might be used to help Web robots decide when
to revisit a page. And it might be reasonable to
present this information to the actual user, whereas
the Expires: header is probably not that interesting
to the user (unless something goes wrong with caching).
(5) My survey of Expires: headers found by Altavista's
crawler this past fall showed that a lot had serious
date-syntax errors.
(6) While there is an obvious Expires: encoding for
"already expired" (i.e., never treat a cached copy
as if fresh), there are no obvious encodings for
"never expires" or "expiration date undefined". The
lack of an Expires: header is usually taken to mean
the latter (see the CERN proxy description). One
could encode the former with a date far in the future,
but I suspect that some implementations will run into
trouble with this kind of encoding in just under 4 years
from this month.
In proposing "Fresh-until:", I was hoping to solve all of these
issues in one swoop. It also provided a way to detect pre-1.1
origin servers without having to use version numbers, which made
it easier for me to think about separating the cases (of 1.1
servers and earlier ones).
But there are numerous good reasons to try to stick with the existing
syntax (if it can be done without introducing new problems), and
in particular it would mean that the existing (pre-1.1 proxies)
might be able to do a reasonably good job of managing their caches.
So I now think that all six of the (potential) problems that I
listed above can be solved while still using the Expires: header.
Here is a summary of how I think it can be done, issue by issue:
(1) conceptual distinction
This can be handled by carefully wording things in the
spec. I.e., it doesn't really matter what header names
we use, as long as we define the concepts clearly and
clearly define how the protocol elements related to
those concepts.
(2) clock skew
Koen Holtman has proposed a simple and (with a few
minor modifications) compatible mechanism to avoid
clock skew problems in most cases. This involves
caches tracking how long they have held onto an
unrefreshed response, and then transmitting this
information in an "Age:" header. One then computes
the difference between the origin server's Date:
and Expires: headers, then sums all of the Ages
and subtracts that sum from the expiration difference
to decide if the fresh-until time has passed.
(3) specification wording regarding "must not cache"
I believe that we can safely reword section 10.19 of
Roy's 1.1 draft from:
Applications must not cache this entity beyond the date given
to:
Applications must not return this entity from their cache
after the date given, without first validating it with
the origin server.
This changes the spec to make it clear when a conditional
GET (or perhaps other methods) is supposed to be used. And
I believe that although this is not strictly the interpretation
used by 1.0 servers, it is not actually unsafe.
Or does anyone know of an server implementation that is
willing to return "304 Not Modified" in response to a
GET If-modified-since on a resource that should not be
used at all after its Expires: date?
(4) distinction between cache freshness and document expiration
Since the 1.0 and 1.1 specs never really introduced this
distinction, or provided a way to express "document
expiration", I think the best solution would be to add
a new header for this purpose.
(5) syntax errors
Roy's 1.1 spec already includes a way for caches to deal
with syntactically incorrect Expires: headers ... treat
them as "Expires: <already>". This doesn't solve the
problem of 1.0 servers that may be sending syntactically
correct but semantically bogus Expires: values, but I
don't think there is a way to solve the problem of
stupid implementors or administrators. Since my bias
is towards safety rather than performance, I'm happy
with a design that turns off caching if there is any
doubt about the Expires: date syntax.
(6) encodings for "never" and "undefined"
Several people have argued convincingly that "no Expires:
header" should mean "undefined" (that is, the cache can
use its own heuristics), in keeping with current (although
sparsely documented) practice.
As for "never", I think the problem is "what is the
largest HTTP date value that we must insist that any
cache can properly parse?" The spec allows RFC 850
dates, which break after 1999. I think the HTTP/1.1
spec should take a stronger stand on this, and insist
that servers never generate RFC-850-style dates (but
clients must continue to accept them).
Even so, I suspect that there may be some implementations
out there that have trouble with dates on or after 01 Jan 2000,
and there are certainly a lot of implementations that use
UNIX-style dates (32-bit seconds since 01 Jan 1970) which
cannot handle dates after sometime in the year 2038.
This means that it's probably a bad idea for the spec to say
something like "Expires: 12-31-9999" means "never expires";
some implementations (especially existing ones) are bound
to represent this wrong.
My "Fresh-until:" header proposal solved this by using
a special keyword for "never expires", but while this
is perhaps possible for HTTP/1.1, it would cause HTTP/1.0
clients and proxies to treat "Expires: never" as "syntax
error, assume Expires: already", which is exactly what
we don't want. Which is to say that while it may be
safe, it would really hurt cache performance for such
documents.
After mulling this over for a while, the best solution
that I can come up with is for the spec to say something
like this:
HTTP/1.1 servers SHOULD represent "never expires"
as an Expires: date approximately [one] year from
the time the response is generated. HTTP/1.1
servers should not send Expires: dates more than
[one] year in the future. All Expires: dates
SHOULD be in RFC-822 form, and MUST NOT be in
RFC-850 form.
HTTP/1.1 clients (and caches) SHOULD be able to
parse HTTP-date values at least one year in
the future (from the time that a response is
received). HTTP/1.1 clients and caches should
assume that an RFC-850 date which appears to
be more than [50] years in the future is in fact
in the past (this helps solve the "year 2000"
problem). All HTTP/1.1 implementations MUST
be capable of correctly parsing any HTTP-date
value that is no less than [25] years in the
future from the date that the implementation
is sold or otherwise distributed. An HTTP/1.1
implementation may internally represent a parsed
Expires: date as earlier than the proper value,
but MUST NOT internally represent a parsed
Expires: date as later than the proper value.
The values in brackets are somewhat arbitrary, and could
be adjusted. I think it's reasonable to represent "expires
never" as "expires one year from now", since this should
not materially affect the server loads.
Summary of my current thinking:
(a) Retain Expires: with some minor changes in wording and
language to deal with the year-2000 problem.
(b) Add "Age:" headers to help with clock skew (and to
provide a reasonable interpretation for "Cache-control: max-age").
(c) Possibly add something like "Resource-expiration:" to
allow distinct representation of that concept.
-Jeff
Received on Monday, 22 January 1996 23:41:28 UTC