Re: NEW ISSUE: Methods and Caching

On Mon, Nov 17, 2008 at 02:32:34AM +0000, Jamie Lokier wrote:
> 
> Robert Siemer wrote:
> > 
> > I think it is pretty clear, that the method is part of the cache key, 
> > otherwise a new completely unrelated method, which does always return 
> > the same response entity (and thus marked as cachable), will interfere 
> > with the next unrelated GET. That blocks new (and old) methods from 
> > defining sensible caching strategies.
> 
> I agreed with Mark and thought the opposite.  I've always imagined the
> cache key is a function of the URL and headers enumerated by 'Vary'
> only.  But...
> 
> It's a good question!  The answer may be that the question is
> ill-formed for HTTP.
> 
> There *isn't* a flat cache key for cached entities, in the logic of
> HTTP.  Think about the 'Vary' header, especially with cached responses
> for the same URL having different 'Vary'.  Efficient but
> non-conservative HTTP cache lookup is tree structured, as it
> progressively refines the set depending on stored 'Vary' of different
> cached responses.  The *set* is then transmitted in If-None-Match, or
> considered for a time-based response.
> 
> Of course you are free to implement a flat cache if you want to be
> conservative and not cache some cacheable things.  Nearly all HTTP
> caches are conservative in this way, because it's easier to implement.
> 
> We all recognise that 'Vary' complicates things, but to _start_ the
> cache lookup, do we start with key = URL only, or key = URL + method?

That's an implementation detail. Flat or not non-flat, part of the 
"initial" key or added later, the result is the same. It is important to 
understand that the method is part of the key. - Applications that would 
like to join the responses of method1+url1 and method2+url2 should 
redirect from the former to the latter (url1 and url2 can be the same).

> If someone defines a new method like say PATCH, clearly it must
> invalidate the previously cached GET result for the same resource
> somehow.
> 
> If the GET result uses _validated_ caching, i.e. ETags, then it's not
> necessary for other methods to invalidate it automatically at
> intermediaries, because the server will be consulted anyway by
> following GETs.

The original question had nothing to do with invalidation, but the
scenario you mentioned has only _invalid_ responses (can be used only
once at response time), there is no need to invalidate later automatically
or manually or whatever. There is no real distinction between invalid and
has-to-be-validated cache entries.

> 
> On the other hand, if the GET result uses _time-based_ caching,
> i.e. without validation required, then it _is_ necessary that other
> methods like PATCH can invalidate it automatically at intermediaries.
> (But then must it do the same for non-canonical URLs, such as "///" in
> them, or %-encoded equivalents?)

Invalidation is a must, but is not a cure-all. An intermediary might not
see the invalidating request at all. If a URL equals another is defined
in RFCs, and "///" does not equal "/". 


> Thus the "ideal" (non-conservative) choice of whether the initial
> cache key depends on the request method _depends_ on the headers of
> previously cached responses.  In other words, it's roughly analogous
> to 'Vary: method', and it might not be a bad idea to define a
> Cache-Control which means that - in requests, responses or both.

Initial or not, "Vary: method" is always "on" in HTTP.



Robert

Received on Monday, 17 November 2008 05:26:41 UTC