Re: Proposal for i23: no-store invalidation [#117] from Jamie Lokier on 2009-12-30 (ietf-http-wg@w3.org from October to December 2009)

From: Jamie Lokier <jamie@shareable.org>
Date: Wed, 30 Dec 2009 22:52:48 +0000
To: Mark Nottingham <mnot@mnot.net>
Cc: Henrik Nordstrom <henrik@henriknordstrom.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20091230225248.GA27494@shareable.org>
Mark Nottingham wrote:
> (digging up an old thread; since this discussion, we created <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/117>).
> 
> On 11/06/2009, at 8:00 AM, Henrik Nordstrom wrote:
> > [...] I meant to say that caches should invalidate the cached
> > response if the cache sees a later response which indicates the
> > resource have changed, even if the new response itself is not
> > getting cached for some reason. This should be independent on
> > which conditions the response is being seen.
> > 
> > The request part is just ways to trigger the conditions where this may
> > be needed. The request is never what causes the invalidation, the
> > response to the request is, in specific conditions.
> > 
> > The tricky part is defining "indicates the resource have changed" I
> > guess, 
[...]
> > Been reading the specs again, and in "RFC2616 13.1.1 Cache Correctness"
> > the condition of receiving a newer response where a previous response is
> > in the cache seems to be specified at only a MAY level, implying that
> > it's fine for the cache to keep the previous response as "current" for
> > as long as it's fresh. I could not find the corresponding section in
> > httpbis-p6. If it's really the intention that caches only MAY replace
> > the previous response with the newer response in the cache storage then
> > there is also no need for the invalidation requirements in the HEAD or
> > caching of negotiated resources sections to be any stronger than a MAY.
> 
> In p6 2.2 we have:
> 
>    Caches MUST use the most recent response (as determined by the Date
>    header) when more than one suitable response is stored.
> 
> This is derived from 2616 13.1.1's:
> 
>   A correct cache MUST respond to a request with the most up-to-date
>    response held by the cache that is appropriate to the request

Below is not a response to the specific question of "MAY level
requirement"; it is a response to the general issue of invalidation
when a "later" message indicates that a server resource has changed.

1. What should happen when the Date header is accidentally set far in the
   future due to misconfiguration or faulty clocks, and then corrected
   on the server?

   That could result in a response stored in a cache whose Date is
   "more recent" than any later response for a long time.

   Transient misconfigurations can lead to erroneous behaviour; that's
   not unexpected and is acceptable, but when misconfigurations are
   corrected it is highly desirable that caches will recover in a
   sensible way, where possible.

2. All the discussions regarding a "later response" invalidating an
   older one do not talk about what happens when responses are
   received in parallel through the same cache.

   For example, if the proxy starts receiving response #1, then 10
   seconds later starts receiving response #2 which clearly indicates
   the resource has changed relative to #1, then 20 seconds later
   finishes receiving response #1.  (Substitute "minutes" for
   "seconds" if it makes the separation clear; that's realistic for
   large responses).

   Should response #2 cause response #1 to be dropped from the cache
   when #2's headers indicate the resource has changed, even though #1
   is still being received?  What if the requests were sent in the
   order #2 then #1, but the responses were received as above?  It
   would be ambiguous which response corresponded with the most recent
   state of the server resource - and therefore wrong for _either_ of
   them to force invalidation of the other stored one.  It is better
   for the cache to store _both_ in that case, if Etags are available.

   In cases where Etags aren't used, Date is probably a better
   indicator of which response corresponds to the most recent state of
   the server resource, and therefore a good reason for cache logic to
   be specified in terms of most recent Date header seen, rather than
   imprecisely specified transfer order (imprecise because it does not
   discuss parallelism or which points in the sequence are being compared).

   I think there is no satisfactory answer which offers useful
   guarantees (in the HTTP cache model), just as there is no way for
   invalidation-due-to-newer-response to offer useful guarantees to
   clients because they may connect via different caching proxies.

   Because there are no useful guarantees, there is probably no
   benefit to making invalidation-due-to-newer-response a MUST level
   requirement.  SHOULD or MAY might be useful practically, but there
   is no technical benefit I can see for making it a MUST.  Clients
   and servers must have other strategies to avoid problems due to old
   cached responses, such as not using long cache times, using Etags
   where it is important, or using content-dependent URIs in referring
   documents.

-- Jamie
Received on Wednesday, 30 December 2009 22:53:20 UTC