Re: Cache key history

On Nov 28, 2008, at 2:41 PM, Mark Nottingham wrote:

> When the cache key discussion came up, it became clear that we  
> needed to do some digging into the history of HTTP caching, which  
> means looking at the mailing list of the original HTTPWG's caching  
> sub-group. Unfortunately, I couldn't locate any online archives  
> remaining, but Martin Hamilton kindly provided an mbox, which has  
> been reconstructed at:
>
> http://lists.w3.org/Archives/Public/http-caching-historical/
>
> In looking through that, it's clear that there was discussion of  
> POST caching, etc. early on;
>   http://lists.w3.org/Archives/Public/http-caching-historical/ 
> 1996Jan/0025.html
>   http://lists.w3.org/Archives/Public/http-caching-historical/ 
> 1996Jan/0026.html
>   http://lists.w3.org/Archives/Public/http-caching-historical/ 
> 1996Jan/0028.html
>   http://lists.w3.org/Archives/Public/http-caching-historical/ 
> 1996Jan/0030.html
>   http://lists.w3.org/Archives/Public/http-caching-historical/ 
> 1996Jan/0075.html
>
> (I believe this is before the difference between Location and  
> Content-Location was specified, which is why Location is mentioned).
>
> But, no consensus was reached, as reflected by the state of the  
> "updated issues list" (under "not agreed");
>   http://lists.w3.org/Archives/Public/http-caching-historical/ 
> 1996Feb/0114.html
>
> It did come up at a F2F, but was not "fully" discussed, and several  
> aspects were deferred;
>   http://lists.w3.org/Archives/Public/http-caching-historical/ 
> 1996Feb/0039.html

I addressed the relevant parts of that meeting (which I was not able
to attend in person) in this post:

<http://lists.w3.org/Archives/Public/http-caching-historical/1996Feb/ 
0095.html>

The question boils down to the three cache models under Extensibility:

 > Larry described possible three ways to view an HTTP cache:
 >
 > 	a) a cache stores values and performs operations on these
 > 	values based on the requests and responses it sees.  For
 > 	the purposes of the cache, one can describe each HTTP
 > 	method as a transformation on the values of one or more
 > 	resources.
 > 	
 > 	b) a cache stores responses, period.
 > 	
 > 	c) a cache stores the responses to specific requests.
 > 	The cache must be cognizant of the potential interactions
 > 	between various requests; for example, a PUT on a resource
 > 	should somehow invalidate the cached result of a previous
 > 	GET on the same resources, but a POST on that resource
 > 	might not invalidate the result of the GET.

The HTTP/1.1 proposal that Henrik and I developed was based on (c).
HTTP is supposed to be more extensible than a storage interface.
Our design decision was to make the messages self-descriptive
rather than assume a prescriptive data model, thereby allowing
efficient cache operation via message description on arbitrary
methods.  It was a known trade-off versus the more traditional
caching models of distributed file systems that could benefit
from write-back caching by limiting the set and scope of
resource-modifying operations to a shared data model.

Rough consensus in both the WG and implementations was on (c), but
that was not entirely reflected in the caching section that was
added to the pre-2068 spec during the final revs.  The caching
section left it out. The rest of the HTTP spec is based on (c).
The visible difference between (a) and (c) is how cacheable
responses to non-GET requests are enabled, which is defined in
model (c) by the method semantics, response status code, and
the response field-values for Cache-Control and Content-Location.
It was not successfully defined by model (a).

In other words, an HTTP cache must consider the method as part
of the cache key if it allows caching of anything other than
GET/HEAD responses.  An HTTP cache cannot do write-back operations.
A response to a non-GET/HEAD request is cacheable if it says so
in cache-control *and* the cache understands how to construct
the cache key for that method (this is presumed to be defined by
the method semantics). Any response that contains a Content-Location
is cacheable as if it were a 200 response to GET if it can be
trusted to be from the same authority as that location value.
It follows, therefore, that a response to POST that includes
both a cacheable Cache-Control and a Content-Location matching
the POST request target is equivalent to saying that the enclosed
entity contains what would be in the response to a GET on that
same URI immediately after the POST completed.

The HTTP/1.1 proposal was not designed to behave like a storage
interface, so it's no surprise that it doesn't look like a CPU
cache or even a disk cache.  Jeff tried to address that issue in
his summary of the cache models.  I think that the subgroup
discussion showed that model (a) did not fit the needs of HTTP.
The subgroup's operating procedures at the time were that the
existing HTTP/1.1 design would not be changed unless there was
rough consensus for the change.

....Roy

Received on Tuesday, 2 December 2008 01:38:30 UTC