A modest proposal, made more specific from Jeffrey Mogul on 1995-08-19 (ietf-http-wg@w3.org from July to September 1995)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Fri, 18 Aug 95 18:00:46 MDT
To: Larry Masinter <masinter@parc.xerox.com>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9508190100.AA10245@acetes.pa.dec.com>
Larry Masinter writes:
    And, secondly, I believe it is reasonable at some point to expect HTTP
    clients and servers and the proxies in between to have correct clocks.
    We're talking about Internet applications now, for machines that are
    well connected on the net.
and suggests that HTTP servers offer time service.    

Ugh.  While it is perhaps reasonable to expect HTTP servers and
clients to have correct clocks, for some definition of "correct"
(talk to Professor Einstein about this), I do not think it is
*necessary* to rely on this when designing the protocol.  Moreover,
I would not confuse "reasonable to expect" with "safe to expect".

Shel Kaphan suggests that clients use the "Date" information to run
a sort of NTP-like clock synchronization algorithm, keeping
track of each server's relative clock performance.

Double ugh.  This is a lot of hard work, and it's already done
by NTP.  And it's not all that useful.

We're still confused about two uses of timestamps: (1) cache validation
and (2) expiration.

I think Shel is close to the truth when he writes
    It is not possible to reliably compare times from two different,
    unsynchronized, clocks. So, don't do it.
But let me try to clarify things a bit more.

(1) For the purpose of cache validation ("is the cached copy I have
valid or not"), it should not be necessary to do ANYTHING besides
a strict equality comparison.  The cached copy is either the same
as the server's copy, or it isn't.  If we are using last-modified
times as the validation ID, then either they are equal or they
aren't.  I fail to see any value in allowing a server to return
304 (not modified) if it's idea of the modification time is prior
(that is, not exactly equal to) the client's (or proxy's) stored
modification time.

[I'll note that all the allowed date formats have one-second
resolution.  This could be a problem in the future, if things are
changing faster than once a second, and browsers can do something
useful with this (but this is speculative, I admit).]

(2) For expiration checking, it is obviously necessary to do inequality
comparisons ("is expiration time > now?").  In this case, though,
we don't need to be 100% accurate, since in almost all cases I can
think of, when someone assigns an expiration date, it's at best a
guess, anyway.  Doing expiration checking does not require carefully
synchronized clocks; it only requires sufficient sanity checking that
badly out-of-whack clocks are not believed.  I think the algorithm
I suggested in my previous message should work fine, noting Roy's
comment that the Date header provides the necessary sanity-checking
info.  (And so HTTP 1.1 should make Date mandatory if Expires is
sent, I think.)

For both cache-validation and expiration, I believe that the algorithms
used by clients and proxies should be identical.  That is, I see no
reason why a client should cache something that a proxy isn't allowed
to (except as explicitly instructed by the "Caching-allowed:" header
or whatever we're calling that).  And there can't be any reason to
allow a proxy to cache something that a client is not allowed to.

So far, what I've suggested in this message does not change the
syntax of the protocol; I'm suggesting changes in what servers,
clients, and proxies do with the headers we already have.  This
means that (so far) everything I've suggested in this message
should interoperate with all reasonable implementations.  (Clearly,
a server sending entirely random values for last-modified, for example,
isn't entirely reasonable.)

People may suspect that I don't entirely like the use of last-modified
dates as cache validators.  It may be that a server implementor would
prefer to use a separately managed and opaque unique-ID, such as a
generation number (as done by NFS) or an MD5 checksum.  This relieves
the server of having to be cautious about file modification dates, and
also solves the problem of insufficient precision in the HTTP timestamp
formats.

I do not believe that file length is useful as a cache validator.  As
many people have pointed out, it's hard to define "length" and it's
not a safe validator, since the file contents may change without changing
the length.  On the other hand, if server implementors are naive enough
to use file length as their opaque identifiers, I'm not going to stop them
(but I won't run their servers!).

So I would like to suggest, for HTTP 1.1, a FULLY COMPATIBLE protocol
change that should solve this problem.  Add a new header returned by
a server (perhaps via a proxy):

	Cache-Validator = "Cache-Validator" ":" opaqueID
	opaqueID       = *( unreserved | reserved )

And a new header sent by clients:

	If-Validator-Valid = "If-Validator-Valid" ":" opaqueID

Clients and proxies are not allowed to do anything with the opaqueID
except return it to the server that it came from by sending it in an
"If-Validator-Valid" header.

A server that receives both "If-Validator-Valid" and "If-Modified-Since"
should ignore the latter.  Otherwise, the spec for "If-Validator-Valid"
should look pretty much like the spec for "If-Modified-Since", except
of course for simpler rules about comparisons.

This allows full interoperability with older implementations.  Old
servers won't send "Cache-Validator" headers; old clients won't
send "If-Validator-Valid".  New clients will send only "If-Modified-Since"
to old servers, since they won't have a validator in this case.
Old clients may receive "Cache-Validator" headers, but they are
already required to ignore unknown headers.  Proxies (old and new)
will pass these new headers in either direction; new proxies may even
use them.

A server is free to use a timestamp as its opaqueID.  It may even
use timestamps for some files, checksums for others, etc.  It might
make sense to use one for files, and the other for CGI output.

To summarize: I think HTTP 1.1 should aim for the simplest possible
correct, interoperable cache-control protocol.  I claim that
what I've proposed fits the bill:
	(1) Servers can use explicit "Caching-allowed:" headers
	to force non-caching behavior when necessary.
	(2) Servers and clients/proxies exchange opaqueIDs as cache
	validators.  This leaves no confusion about interpretation.
	Choice of how unique opaqueIDs are generated is left entirely
	up to the server implementor.
	(3) Cache expiration is done using a simple "Expires+Date"
	pair that allows the client/proxy to do a sanity-check,
	without requiring synchronized clocks.

-Jeff
Received on Friday, 18 August 1995 18:07:14 UTC