- From: Jamie Lokier <jamie@shareable.org>
- Date: Wed, 30 Dec 2009 22:52:48 +0000
- To: Mark Nottingham <mnot@mnot.net>
- Cc: Henrik Nordstrom <henrik@henriknordstrom.net>, HTTP Working Group <ietf-http-wg@w3.org>
Mark Nottingham wrote: > (digging up an old thread; since this discussion, we created <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/117>). > > On 11/06/2009, at 8:00 AM, Henrik Nordstrom wrote: > > [...] I meant to say that caches should invalidate the cached > > response if the cache sees a later response which indicates the > > resource have changed, even if the new response itself is not > > getting cached for some reason. This should be independent on > > which conditions the response is being seen. > > > > The request part is just ways to trigger the conditions where this may > > be needed. The request is never what causes the invalidation, the > > response to the request is, in specific conditions. > > > > The tricky part is defining "indicates the resource have changed" I > > guess, [...] > > Been reading the specs again, and in "RFC2616 13.1.1 Cache Correctness" > > the condition of receiving a newer response where a previous response is > > in the cache seems to be specified at only a MAY level, implying that > > it's fine for the cache to keep the previous response as "current" for > > as long as it's fresh. I could not find the corresponding section in > > httpbis-p6. If it's really the intention that caches only MAY replace > > the previous response with the newer response in the cache storage then > > there is also no need for the invalidation requirements in the HEAD or > > caching of negotiated resources sections to be any stronger than a MAY. > > In p6 2.2 we have: > > Caches MUST use the most recent response (as determined by the Date > header) when more than one suitable response is stored. > > This is derived from 2616 13.1.1's: > > A correct cache MUST respond to a request with the most up-to-date > response held by the cache that is appropriate to the request Below is not a response to the specific question of "MAY level requirement"; it is a response to the general issue of invalidation when a "later" message indicates that a server resource has changed. 1. What should happen when the Date header is accidentally set far in the future due to misconfiguration or faulty clocks, and then corrected on the server? That could result in a response stored in a cache whose Date is "more recent" than any later response for a long time. Transient misconfigurations can lead to erroneous behaviour; that's not unexpected and is acceptable, but when misconfigurations are corrected it is highly desirable that caches will recover in a sensible way, where possible. 2. All the discussions regarding a "later response" invalidating an older one do not talk about what happens when responses are received in parallel through the same cache. For example, if the proxy starts receiving response #1, then 10 seconds later starts receiving response #2 which clearly indicates the resource has changed relative to #1, then 20 seconds later finishes receiving response #1. (Substitute "minutes" for "seconds" if it makes the separation clear; that's realistic for large responses). Should response #2 cause response #1 to be dropped from the cache when #2's headers indicate the resource has changed, even though #1 is still being received? What if the requests were sent in the order #2 then #1, but the responses were received as above? It would be ambiguous which response corresponded with the most recent state of the server resource - and therefore wrong for _either_ of them to force invalidation of the other stored one. It is better for the cache to store _both_ in that case, if Etags are available. In cases where Etags aren't used, Date is probably a better indicator of which response corresponds to the most recent state of the server resource, and therefore a good reason for cache logic to be specified in terms of most recent Date header seen, rather than imprecisely specified transfer order (imprecise because it does not discuss parallelism or which points in the sequence are being compared). I think there is no satisfactory answer which offers useful guarantees (in the HTTP cache model), just as there is no way for invalidation-due-to-newer-response to offer useful guarantees to clients because they may connect via different caching proxies. Because there are no useful guarantees, there is probably no benefit to making invalidation-due-to-newer-response a MUST level requirement. SHOULD or MAY might be useful practically, but there is no technical benefit I can see for making it a MUST. Clients and servers must have other strategies to avoid problems due to old cached responses, such as not using long cache times, using Etags where it is important, or using content-dependent URIs in referring documents. -- Jamie
Received on Wednesday, 30 December 2009 22:53:20 UTC