Re: Some data related to the frequency of cache-busting

On Wed, 27 Nov 1996, Jeffrey Mogul wrote:

> Anyway, the results are
> 	Responses with no last-modified time: 10401
> 	Responses pre-expired: 28
> for a total of 10429 cache-busted refs, with these byte-counts:
> 	3932702 req-bytes, 81597623 resp-bytes, 85530325 bytes
> 
> As a fraction of all 61108 references, this is
> 	17% of the references
> 	21% of the req-bytes, 25% of the resp-bytes, 25% of the total bytes
> 
> As a fraction of the 33589 non-query possibly-cachable references:
> 	31% of the references
> 	38% of the req-bytes, 30% of the resp-bytes, 30% of the total bytes
> 
> Summary: while it is certainly debatable whether my categorization
> of no-Last-Modified responses as "cache-busted" is appropriate or not,
> if one accepts this categorization, then the frequency of cache-busting
> seems to be pretty high.  One could also debate how much this would
> be reduced by our hit-metering proposal, but there does seem to be
> some potential here.

You pegged my primary objection to your methodology. It is entirely
unsupportable to label having no last-modified as being deliberate
cache-busting (the only kind of cache busting this proposal could affect).
Pretty much all CGI does this (no last-modified) by default:

from www.netimages.com:

GET /ni-cgi-bin/fetch HTTP/1.0

HTTP/1.0 200 OK
Date: Thu, 28 Nov 1996 15:03:19 GMT
Server: Apache/1.1.1
Content-type: text/html
Set-Cookie: Apache=19830833849193398884; path=/

I made absolutely no effort to intentionally cache bust the response (the
data served is static - but from a huge database of Usenet articles). In
fact - I wrote it well before I knew *how* to deliberately cache bust. 

>From a customer of mine who is using an off-the-shelf database frontend
(and who doesn't have the slightest idea that cache busting is even
*possible* - never mind doing it deliberately):

>From www.trcnet.com.

GET / HTTP/1.0

HTTP/1.0 200 OK
Server: Domino/1.0
Date: Thursday, 28-Nov-96 15:20:01 GMT
Content-Type: text/html
Content-Length: 2946

I would say the only *confirmable* deliberate cache busting done are the
28 pre-expired responses. And they are an insignificant (almost
unmeasurable) percentage of the responses. 

As you noted - much more study is needed. This one is utterly
inconclusive. You conclude from your numbers that significant savings can
be found. I conclude from the same numbers that the extra overhead of the
hit metering in fact is *higher* than the loses to deliberate cache
busting. You would have more network traffic querying for hit meter
results than the savings for such a tiny number of cache busted responses.

-- 
Benjamin Franz

Received on Thursday, 28 November 1996 07:27:46 UTC