Re: HTTP Caching Design

Some comments on your comments on my proposal, somewhat reordered
and elided.

Executive summary: we don't really disagree on as many points
as Roy thinks we disagree about, but there are a few real differences
of opinion.

    Note that the current HTTP/1.1 design implicitly encourages
    caching.  [...] In contrast, the design that Jeff Mogul has
    proposed in <http://ftp.digital.com/%7emogul/cachedraft.txt>
    implicitly *discourages* caching.

It is certainly not my intention to discourage caching.  I most
certainly want to encourage it as much as possible, and I think
that one crucial way to do this is to ensure that caching does
not cause confusion; i.e., I want to reduce the likelihood that
aggressive caching could result in incorrect behavior.  This
is a key point, and one that I should have made explicit.

In particular, the approach that I am pushing is to provide
sufficient explicit information via the HTTP protocol to make
it possible for caches to present the right version of a Web
page.  If caches are forced to make too many decisions based
on heuristic inferences, they are going to either be too strict
or too lenient.  If, on the other hand, we give the caches
the necessary information, they can maximize the efficiency
of the protocol without providing unexpected wrong answers.

I suppose that one could infer that my insistence on making
caching information explicit is a discouragement of caching.
Such an inference is entirely wrong, and if my proposal leaves
any room to make that inference, I will do what I can to correct
the language.

    It does so because HTTP caching is a necessity for the common good
    of all users of the Web, whether or not they are aware of it.

This is a statement we seem to agree on, but have different definitions
for some of the words.  In particular, it is not sufficient to
assert that HTTP caching is good, because we haven't specifically
agreed on what constitutes "good".

    BTW, "cachable" in HTTP means that the response may be reused as
    the response for an equivalent future request -- it does not mean
    just "may be stored".

I think we agree entirely on this, and in fact I've tried to make
this extremely explicit in section 2.5, "Cache validation and the
``immutable cache'' concept".

I do feel that the word "cachable" is somewhat hard to define, and I've
tried to avoid it in my proposal.  This is because it leads to attempts
to make a distinction purely between "cachable" and "not cachable",
which is actually a secondary distinction derived from primary ones
that may change over time.  I'm indebted to one of you (whose name I've
forgotten) for contributing the "stale/fresh" terms, and to Koen
Holtman for pushing me to add "firsthand" to this list (although he
suggested "original", which has a slightly different connotation.)

    In addition, all server developers must create a new mechanism for
    allowing user's to specify the "freshness" of each and every
    resource, even though the vast majority of such resources don't
    have any implied notion of "freshness" and cannot conceive of the
    actual needs of caches within the organizations of possible
    recipients.

I'll assume that the subject of the verb "conceive" here is not a
resource but the human who runs the server.  We can argue about this,
but I want to make sure that *we* have a clear understanding of these
notions when we discuss proposals.  And in that respect, it's quite
important to realize that "fresh" applies NOT to a resource but to a
cached copy of a resource, and only makes sense in relation to a
specific point in time.   Therefore, it makes no sense to infer that I
would require servers to provide a way to explicitly specify the
freshness of each and every resource, any more than the concept of an
expiration date implies the analogous requirement.  We can argue about
what the default should be in the absence of a specification, but I
don't think we disagree that a default is reasonable in many (or most)
cases.

    [In Roy's proposal], any content provider that "doesn't care" about
    the cachability of a resource will be given the default behavior
    which is good for caching.  At the same time, any content provider
    that "does care" about the cachability of a resource is provided a
    mechanism to express their needs.  [...] [In Jeff's proposal], the
    provider must do extra work even when they "don't care".  [...]

    The result will be either no-cache by default, or bogus "freshness"
    criteria applied to every resource by default.  Both cases will
    result in excessive prevention of reasonable caching.

I think you are running into the explicit/implicit distinction without
realizing it.  If I understand you correctly, you are saying that in
your approach, resources are by default cachable "forever" without
checking with the origin server, "forever" being modified by heuristics
applied by the cache.  You also allow the server to override this
behavior using "Cache-control: max-age".

In my approach, the server could either say
	Fresh-until: forever
or
	Fresh-until: <specific time>
which to me seems to be entirely a matter of syntax, with one exception.
I admit that I have not included a way for the server to say something
like:
	Fresh-until: I don't care, a long time, but use your judgement
(in otherwords, "forever is probably too long").  If you believe that
(1) caches ought to be making such decisions, and (2) they ought not
to be doing so without explicit permission from the server, then I
have no objections to adding some sort of cache control directive
that allows the server to say whether or not this is the case.  In
fact, I don't really care (in this case) whether the default is
"yes" or "no", just as long as we agree that there is a choice to
be made.

While Koen Holtman asserts that this:
  An HTTP/1.1 server SHOULD provide a fresh-until time with every
  cachable entity, but if it does not, the cache must assume a value of
  zero.

"is a very bad thing" (why?), in the next breath he agrees that there
ought to be some finite value here.  I proposed "zero", he proposed
"7 days", and I think we can both agree that the other one's proposal
is too extreme :-).  I'm happy to discuss the particular value, or
recommended range of values, but I think it would be a mistake to leave
it completely unconstrained.

Having at one point said that the length of the freshness period
"is determined by the cache according to its own set of heuristics/needs",
Roy later writes (re: my proposal):
    Furthermore, because this represents a change from the
    HTTP/1.0 defaults, the cache mechanism is required to employ
    separate behavior depending on the server version.
The only way to interpret these two statements without contradiction
is to believe that Roy is not objecting to a cache's use of a heuristic
based on server version, but rather to putting a specific heuristic
into the specification.  And since I left this unspecified so far,
I'd be willing to consider a proposal that it remain unspecified.
I did also include a way for caches to mark these values as
heuristically derived when they are passed along to other caches,
because of my aim to provide explicit information in the protocol.
That is, I don't want another cache having to guess if the value
it got came from the origin server or if it was produce by a possibly
buggy heuristic. 

-Jeff

Received on Monday, 8 January 1996 22:28:47 UTC