Re: First draft of a list of goals from Jeffrey Mogul on 1995-12-28 (http-caching-historical@w3.org from December 1995)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Thu, 28 Dec 95 13:26:06 PST
To: "David W. Morris" <dwm@shell.portal.com>
Cc: HTTP Caching Subgroup <http-caching@pa.dec.com>
Message-Id: <9512282126.AA06566@acetes.pa.dec.com>
    I don't know that the protocol needs to care, but from my
    experience with many forms of documents over the years I believe we
    should acknowledge that the most important issue with respect to
    'staleness' of a document is the impact on the receiver of
    incorrect content.

    As servers get more sophisticated in their document management
    model, expiration will become a more meaningful concept. Some
    documents must be current. For many documents, expiration is not a
    hard date but rather a general notion something like we know the
    minimum review cycle for a updated personnel standard is X days.
    Hence, at any qiven point in time the smart server could report an
    expiration of NOW+X days unless the document is marked as under
    review.

Just to clarify things: we have been using the term "expiration"
to refer to two somewhat different things: the expiration of a
document (or other object), and the expiration of cached copies
of a document/object.

For example, a server may know for sure that a document expires on June
1, 1999.  But it may want to limit the unvalidated lifetime of a cached
copy handed out at any given point before then to 12 hours, on the off
chance that the person who wrote that document accidentally included a
libelous comment and may want to withdraw it sooner.  (I'm
anthropomorphizing "server" to include its hardware, its software,
and its meatware [human administrators].)

I've been thinking all along about the latter meaning (cached-copy
expiration), not the former.  To me it makes sense that the
Expires: date handed out by a server should be the minimum of
the two kinds of "expiration", if both are specified.  Document
expirations are likely to be fixed dates; cached-copy expirations
are likely to be offsets from the generation of a response.

    Basically, from the protocol perspective, a well formed expiration
    model should expect expiration to change without any other change
    to the document.

True.  This implies that an interaction with the server that results
in a cache update (even one as simple as marking the copy "still
valid") should return a new Expires: header, so that if the expiration
time has been revised to be earlier, this is seen by the cache.
Cached-copy expiration times are dynamic values, not static ones.

    If we look beyond HTTP 1.1 into the future, we must recognize that
    HTTP caching (client, proxy, mirror) is a form of pretty primative
    distributed data base. There has been research and development for
    years in that problem domain and not all distributed data base
    models insist on exact copies.  As I look forward, I would expect
    that caching systems would notify the 'owner' of intent to cache.
    In that world, expirations can be safely set for long intervals
    because the 'owner' can notify caches of changes.  THe cache can
    then decide to simply purge the data, pre-fetch frequently
    referenced data, etc.

I think you are touching on the problem of "revocation."  This
seems to require that the origin-server is aware of all of the places
where a cached copy might exist.  It's not sufficient for the
server to simply know about the last-hop proxy, since another cached
copy could also exist closer to the client, and the client might
have switched proxies by the time that revocation is needed.
And it also requires some sort of call-back mechanism, which in
turn may require algorithms for dealing with crash recovery and
transient network partitions.  All of which makes it highly unlikely
that we could address these in the context of HTTP 1.1, I think.

-Jeff
Received on Thursday, 28 December 1995 21:32:40 UTC