- From: Shel Kaphan <sjk@amazon.com>
- Date: Sat, 6 Jan 1996 12:05:18 -0800
- To: http-caching@pa.dec.com
- Cc: sjk@digital.com
In the discussion about GET and POST, we touched on the issue of cache keys. Hopefully we're close enough to closure on the GET/POST/side-effects discussion to go on to something else, so here's something else that seems fundamental (to me) and needs attention before we call ourselves "done". In order to talk about the semantics of HTTP caching, it seems clear that we have to establish guidelines for how objects should be identified in caches. If we don't, then caches with outwardly different behavior (not just performance) may be possible. So it seems that our job is to specify what is necessary to guarantee correct functioning, without overspecifying it beyond functional requirements. On the other hand, the easiest way to think about these things sometimes is to be concrete. So I will just start this by saying how I think things should work, and when everyone tells me what's wrong with it, we can change it, and also let the higher level logic of it become clearer as we discuss it. What should be in a cache key? - It seems unarguable that the request-URI must be part of the key. I give no justification. - some information to aid caching of content negotiated responses. (I won't go into this further -- it's for the content-negotiation group). - For POSTs where the response contains Cache-control: no-side-effects, it is clear that the request-entity-body must be part of the cache key for any future POSTs to be servable from the cache. A cache without this could be built, but could not independently serve POSTs. (How does this interact with different method requests that contain entity-bodies, on the same URI?) - For POSTs where the response does not contain Cache-control:no-side-effects, the response cannot be used to answer any POSTs. - For a cache entry to be used for a GET, it seems that the following must be true: - the request URIs must match the key, and content negotiation info must "match" and - there is no request-entity-body in the key, or if there is a request-entity-body, - the location URI must exist for the entry, and match the request URI. - the Location URI is a useful part of the key, because it distinguishes different content-negotiated versions of the object. - We can't just respond to GETs based on cached responses on different request-URI's that claim to have a matching Location-URI, or else there's a spoofing possibility. To get a certain level of coherency within a cache, it may be necessary to use the Location-URI part of the key in a special way, but we may decide we don't need that level of coherency (see discussion below). - What about the HTTP method? I do not believe that the HTTP method should be part of the cache key. In fact, I think it can't be. ------------- What about this Location header business? How could it work? If we generally allow the Location header to exist in 2xx responses, then it seems that it should mean more than just identifying one of several alternatives in content negotiation. It might also be returned for a request on a non-negotiated URI. I think that should mean "I performed the operation you requested, and here is the object I'm giving you". However, even the use of Location in content-negotiation responses opens up a spoofing possibility. If both the request-URI and location-URI are part of the key, and never just the Location-URI, then that spoofing possibility is eliminated. But why use the Location-URI as part of the key at all then? For coherency. The interesting and possibly messy issue is then how to get this coherency. If the same Location-URI may be returned in response to different Request-URIs, are those entries in the cache entirely separate? Or do they interact in some way? Clearly they cannot be "the same" entry. If I do a get on URI-A, and receive "location" URI-A in response, so there is an object in the cache with key (URI-A, URI-A), and then someone does a request on URI-B and receives a response with a spoof-attempt URI-A in the response, so that the key is (URI-B, URI-A), we certainly would not want later requests for GETs on URI-A to return the second object, so these cache entries must be logically separate. On the other hand, if there are several valid ways in which I might receive URI-A in response to different request-URIs, or different content-negotiation parameters, the different copies of the URI-A object might be evolving over time and might have changed, so a user might receive different versions out of the cache for different requests. You might argue that that is OK, since the server must have set the freshness value "too long" -- i.e. they changed the object before it became stale, so it is OK to return previous, but still fresh, objects. Certainly with no revocation protocol, this may happen if a client switches caches, however, do we want to allow this in a single cache? Do we want to address this in the protocol document? I don't know, but here's how one might approach it: Suppose whenever any object with location URI-A is received from a server, a cache "flushes" any other object with the same Location-URI part of the key from the cache, i.e. prevents the cache from returning those other copies again without validation. This would prevent the multiple version problem, but would reduce the cache's effectiveness. One simple optimization is NOT to do this flushing for other copies of the object that have the same cache-validator, i.e. that claim to be the same version. This isn't a spoofing hole, since even if someone returns a spoof object with a spoof validator, it can never be returned on the intended spoof target's URI. --Shel
Received on Saturday, 6 January 1996 20:23:13 UTC