Re: NEW ISSUE: Methods and Caching from Jamie Lokier on 2008-11-17 (ietf-http-wg@w3.org from October to December 2008)

From: Jamie Lokier <jamie@shareable.org>
Date: Mon, 17 Nov 2008 02:32:34 +0000
To: Robert Siemer <Robert.Siemer-httpwg@backsla.sh>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20081117023234.GA13114@shareable.org>

Robert Siemer wrote:
> 
> I think it is pretty clear, that the method is part of the cache key, 
> otherwise a new completely unrelated method, which does always return 
> the same response entity (and thus marked as cachable), will interfere 
> with the next unrelated GET. That blocks new (and old) methods from 
> defining sensible caching strategies.

I agreed with Mark and thought the opposite.  I've always imagined the
cache key is a function of the URL and headers enumerated by 'Vary'
only.  But...

It's a good question!  The answer may be that the question is
ill-formed for HTTP.

There *isn't* a flat cache key for cached entities, in the logic of
HTTP.  Think about the 'Vary' header, especially with cached responses
for the same URL having different 'Vary'.  Efficient but
non-conservative HTTP cache lookup is tree structured, as it
progressively refines the set depending on stored 'Vary' of different
cached responses.  The *set* is then transmitted in If-None-Match, or
considered for a time-based response.

Of course you are free to implement a flat cache if you want to be
conservative and not cache some cacheable things.  Nearly all HTTP
caches are conservative in this way, because it's easier to implement.

We all recognise that 'Vary' complicates things, but to _start_ the
cache lookup, do we start with key = URL only, or key = URL + method?

If someone defines a new method like say PATCH, clearly it must
invalidate the previously cached GET result for the same resource
somehow.

If the GET result uses _validated_ caching, i.e. ETags, then it's not
necessary for other methods to invalidate it automatically at
intermediaries, because the server will be consulted anyway by
following GETs.

On the other hand, if the GET result uses _time-based_ caching,
i.e. without validation required, then it _is_ necessary that other
methods like PATCH can invalidate it automatically at intermediaries.
(But then must it do the same for non-canonical URLs, such as "///" in
them, or %-encoded equivalents?)

Thus the "ideal" (non-conservative) choice of whether the initial
cache key depends on the request method _depends_ on the headers of
previously cached responses.  In other words, it's roughly analogous
to 'Vary: method', and it might not be a bad idea to define a
Cache-Control which means that - in requests, responses or both.

-- Jamie

Received on Monday, 17 November 2008 02:33:12 UTC