- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Tue, 20 Feb 96 16:57:47 PST
- To: http-caching@pa.dec.com
When a cache (either in a user agent or in a proxy) has a cache entry and wants to find out if the origin server (or an inbound cache) still considers that entry to be a semantically appropriate response for a given request, it makes a conditional request for the resource. In HTTP/1.0, the only way to do a conditional request is to include an If-Modified-Since: header in the request. For HTTP/1.1, I have proposed allowing the server to send an opaque validator header in a response, and requiring HTTP/1.1 clients to provide this validator in any conditional request. I have also proposed that these two validation methods (opaque validators, and If-Modified-Since:) are the only mechanisms for cache validation in conditional requests. Roy has proposed a more elaborate mechanism, consisting of (as he puts it) IF, If-ID, Unless-Modified-Since, Cache-control: max-age, etc. In other words, Roy wants the client to be able to make the request conditional on a wide range of predicates. There may be no actual conflict in our positions, if we can agree on some conceptual points. In particular, I think we are again having problems with terminology, specifically "conditional request". We really have two different kinds of conditionals to consider: (1) cache-validation conditions: "does what I currently have in my cache match what you would currently send me in response to this request sufficiently well to maintain sematic transparency from your point of view?" (2) utility conditions: "will what you send me be useful to me according to some criteria (because if it is not, I don't want you to send it)?" An example of (2) might be "is what I have asked for more than 1000000 bytes, because if it is, I don't want to wait for it, so you shouldn't send it". I think this is an orthogonal concept to cache validation, and we ought to be thinking about it separately. I strongly believe that cache validation ought to be done with a single, *server-specified* condition. This means opaque validators, period. Since we need to interoperate with existing systems, we must also support If-modified-since. Conservative cache validation is necessary to preserve semantic transparency. That is, if we don't interoperate on this aspect of the protocol, then we lose semantic transparency. I also believe that there is a good case to be made for supporting utility conditions, especially in view of the limited bandwidths and high latencies of many networks. This is sort of like the (bogus) story of the $250 cookie recipe that keeps popping up: a user ought to be able to avoid being astonished by the cost of something she or he requests. Roy's Unless: header seems like a good way to express these utility conditions, since they are client-centric, not server-centric. I assume that when Roy mentions an "If:" header, he means that If: <predicate> is identical in meaning to Unless: {not <predicate>} and so that header can also express utility conditions. Roy migh argue at this point that I have it all wrong. This would reflect our basic disagreement about server-centric vs. client-centric control of cache transparency. Or Roy might say, "aha, now we are getting somewhere" because our prior disagreement was based on a confusion between controlling cache validation and controlling orthogonal utility preferences. We'll see. :-) Roy also believes that Content-ID should be used as a cache validator. At first I was mystified about this, because Content-ID doesn't appear anywhere in any HTTP specification draft, but I did find draft-levinson-cid-02.txt, "Content-ID and Message-ID Uniform Resource Locators", and I suppose this is what he must mean. If not, I would appreciate a clarification. If Roy means this Content-ID proposal, which defines cidurl = "cid" ":" addr-spec where "addr-spec" is defined in [RFC822]. I.e., addr-spec = local-part "@" domain If so, I fail to see how this stuff has any value at all as a cache validator. "Cache-control: max-age" (in a request) as well as other cache-control directives that I have proposed (fresh-min, stale-max) are not exactly cache validation conditions. Rather, they effectively allow the user to specify whether or not a cache entry should be validated. So these are not really conditions on the request between the client and the cache; rather, they are instructions from the client to the cache saying when conditional requests should be done from the cache to some inbound cache or origin server. How do the various conditions fit together? To me, it makes sense to treat them conjunctively. That is, if a request includes both a cache validator (using either If-Modified-Since: or If-Invalid:) and a utility condition (such as "Unless: {gt {Content-Length 10000}}"), then the server should apply both conditions with an implied "AND". For example, this request: GET /home.html HTTP/1.1 If-Invalid: xyzzy Unless: {gt {Content-Length 10000}} means if (current-validator == xyzzy) return "304 Not Modified" else if (content-length > 10000) return "412 Unless true" else return "200 OK" + entire body It makes sense to evaluate the cache-validation condition before any utility conditions, because why bother evaluating the utility conditions if the client already has the response in its cache? There's a possible case to be made for treating If-Modified-Since: as either a cache validation condition or a utility condition based on context. For example, one perhaps could send GET /home.html HTTP/1.1 If-Invalid: xyzzy If-Modified-Since: Thu, Feb 15 1996 00:00:00 GMT which would appear to mean "send me a new copy of the entire response if my cache is invalid AND the resource value changed on or after Feb. 15". However, this is problematic because if the cache entry is invalid but the resource changed on Feb. 14, should the server return "304 not modified"? This is not actually true, and it would confuse an intervening cache who might then use this response for some other client. I would express this request instead as GET /home.html HTTP/1.1 If-Invalid: xyzzy Unless: {gt {Last-Modified "Thu, Feb 15 1996 00:00:00 GMT"}}} because then the server can either return "304 not modified" if the cache's copy is still semantically equivalent, or "412 Unless true" if the value is modified but the utility condition doesn't hold. An intervening cache would not be allowed to infer anything about the validity of a previously cached response from a "412 Unless true" (in particular, a cache MUST NOT update a heuristically derived expiration time from this response!) Then, the meaning I would ascribe to: GET /home.html HTTP/1.1 If-Invalid: xyzzy If-Modified-Since: Thu, Feb 15 1996 00:00:00 GMT Cache-control: reload would be "send me the full response for home.html if either my cached copy is invalid (based on the opaque validator) OR if it has been modified on or after Feb. 15." This makes it possible for a client to stick both validators into a request and have the right thing happen with either an HTTP/1.1 origin server (that wants to see the opaque validator, or else it would not have sent one originally) or an HTTP/1.0 cache (that would not understand the opaque validator). I hope that this somewhat lengthy message clarifies things, and perhaps even provides a way for Roy and me to agree on what these conditionals actually mean. -Jeff
Received on Wednesday, 21 February 1996 01:09:29 UTC