Cache validators

When a cache (either in a user agent or in a proxy) has a cache entry
and wants to find out if the origin server (or an inbound cache) still
considers that entry to be a semantically appropriate response for a
given request, it makes a conditional request for the resource.

In HTTP/1.0, the only way to do a conditional request is to include
an If-Modified-Since: header in the request.

For HTTP/1.1, I have proposed allowing the server to send an
opaque validator header in a response, and requiring HTTP/1.1
clients to provide this validator in any conditional request.

I have also proposed that these two validation methods (opaque
validators, and If-Modified-Since:) are the only mechanisms for
cache validation in conditional requests.

Roy has proposed a more elaborate mechanism, consisting of
(as he puts it)

    IF, If-ID, Unless-Modified-Since, Cache-control: max-age, etc.

In other words, Roy wants the client to be able to make the request
conditional on a wide range of predicates.

There may be no actual conflict in our positions, if we can agree
on some conceptual points.  In particular, I think we are again
having problems with terminology, specifically "conditional request".

We really have two different kinds of conditionals to consider:

	(1) cache-validation conditions: "does what I currently
	have in my cache match what you would currently send
	me in response to this request sufficiently well to
	maintain sematic transparency from your point of view?"

	(2) utility conditions: "will what you send me be useful
	to me according to some criteria (because if it is not,
	I don't want you to send it)?"

An example of (2) might be "is what I have asked for more than
1000000 bytes, because if it is, I don't want to wait for it,
so you shouldn't send it".  I think this is an orthogonal concept
to cache validation, and we ought to be thinking about it separately.

I strongly believe that cache validation ought to be done with
a single, *server-specified* condition.  This means opaque validators,
period.  Since we need to interoperate with existing systems, we
must also support If-modified-since.

Conservative cache validation is necessary to preserve semantic
transparency.  That is, if we don't interoperate on this aspect
of the protocol, then we lose semantic transparency.

I also believe that there is a good case to be made for supporting
utility conditions, especially in view of the limited bandwidths
and high latencies of many networks.  This is sort of like the
(bogus) story of the $250 cookie recipe that keeps popping up:
a user ought to be able to avoid being astonished by the cost of
something she or he requests.  Roy's Unless: header seems like
a good way to express these utility conditions, since they are
client-centric, not server-centric.  I assume that when Roy
mentions an "If:" header, he means that
	If: <predicate>
is identical in meaning to
	Unless: {not <predicate>}
and so that header can also express utility conditions.

Roy migh argue at this point that I have it all wrong.  This
would reflect our basic disagreement about server-centric
vs. client-centric control of cache transparency.  Or Roy
might say, "aha, now we are getting somewhere" because our
prior disagreement was based on a confusion between controlling
cache validation and controlling orthogonal utility preferences.
We'll see. :-)

Roy also believes that Content-ID should be used as a cache
validator.  At first I was mystified about this, because Content-ID
doesn't appear anywhere in any HTTP specification draft, but
I did find draft-levinson-cid-02.txt, "Content-ID and Message-ID
Uniform Resource Locators", and I suppose this is what he must mean.
If not, I would appreciate a clarification.

If Roy means this Content-ID proposal, which defines
	cidurl     = "cid" ":" addr-spec
   where "addr-spec" is defined in [RFC822].
I.e.,
	addr-spec   =  local-part "@" domain

If so, I fail to see how this stuff has any value at all as
a cache validator.

"Cache-control: max-age" (in a request) as well as other cache-control
directives that I have proposed (fresh-min, stale-max) are not
exactly cache validation conditions.  Rather, they effectively
allow the user to specify whether or not a cache entry should be
validated.  So these are not really conditions on the request
between the client and the cache; rather, they are instructions
from the client to the cache saying when conditional requests
should be done from the cache to some inbound cache or origin
server.

How do the various conditions fit together?  To me, it makes
sense to treat them conjunctively.  That is, if a request
includes both a cache validator (using either If-Modified-Since:
or If-Invalid:) and a utility condition (such as
"Unless: {gt {Content-Length 10000}}"), then the server should
apply both conditions with an implied "AND".

For example, this request:
	GET /home.html HTTP/1.1
	If-Invalid: xyzzy
	Unless: {gt {Content-Length 10000}}
means
	if (current-validator == xyzzy)
	    return "304 Not Modified"
	else if (content-length > 10000)
	    return "412 Unless true"
	else
	    return "200 OK" + entire body

It makes sense to evaluate the cache-validation condition
before any utility conditions, because why bother evaluating
the utility conditions if the client already has the response
in its cache?

There's a possible case to be made for treating If-Modified-Since:
as either a cache validation condition or a utility condition based
on context.  For example, one perhaps could send
	GET /home.html HTTP/1.1
	If-Invalid: xyzzy
	If-Modified-Since: Thu, Feb 15 1996 00:00:00 GMT
which would appear to mean "send me a new copy of the entire
response if my cache is invalid AND the resource value changed
on or after Feb. 15".  However, this is problematic because
if the cache entry is invalid but the resource changed on
Feb. 14, should the server return "304 not modified"?  This
is not actually true, and it would confuse an intervening
cache who might then use this response for some other client.

I would express this request instead as
	GET /home.html HTTP/1.1
	If-Invalid: xyzzy
	Unless: {gt {Last-Modified "Thu, Feb 15 1996 00:00:00 GMT"}}}
because then the server can either return "304 not modified"
if the cache's copy is still semantically equivalent, or "412
Unless true" if the value is modified but the utility condition
doesn't hold.  An intervening cache would not be allowed to
infer anything about the validity of a previously cached response
from a "412 Unless true" (in particular, a cache MUST NOT update
a heuristically derived expiration time from this response!)

Then, the meaning I would ascribe to:
	GET /home.html HTTP/1.1
	If-Invalid: xyzzy
	If-Modified-Since: Thu, Feb 15 1996 00:00:00 GMT
	Cache-control: reload
would be "send me the full response for home.html if either
my cached copy is invalid (based on the opaque validator) OR
if it has been modified on or after Feb. 15."  This makes
it possible for a client to stick both validators into a request
and have the right thing happen with either an HTTP/1.1 origin server
(that wants to see the opaque validator, or else it would not have
sent one originally) or an HTTP/1.0 cache (that would not understand
the opaque validator).

I hope that this somewhat lengthy message clarifies things, and
perhaps even provides a way for Roy and me to agree on what
these conditionals actually mean.

-Jeff

Received on Wednesday, 21 February 1996 01:09:29 UTC