Re: Some data related to the frequency of cache-busting

On Tue, 3 Dec 1996, Drazen Kacar wrote:

> Jeffrey Mogul wrote:
> > 
> > If someone would like to propose a *feasible* filter on URLs
> > and/or response headers (i.e., something that I could implement
> > in a few dozen lines of C) that would exclude other CGI
> > output (i.e., besides URLs containing "?" or "cgi-bin", which
> > I already exclude), then I am happy to re-run my analysis.
> 
> You can check for everything that ends with ".cgi" and ".nph" as well
> as everything that starts with "nph-". Don't forget that CGIs can
> have trailing path info.
> 
> > Drazen Kacar pointed out that I should probably have
> > excluded .shtml URLs from this category, as well, because
> > they are essentially the same thing as CGI output.  I checked
> > and found that 354 of the references in the trace were to .shtml
> > URLs, and hence 10075, instead of 10429, of the references
> > should have been categorized as possibly cache-busted.  (This
> > is a net change of less than 4%.)
> 
> There is a short (3 char) extension as well. I don't know which one.
> I think it's ".shm", but I'm not sure. You'll get additional percent
> or two if you inlude all of these.

Its worse than that. The world has started using many different TLA
extensions for CGI type stuff. .dll is used on the Microsoft site with a
path segment of 'isapi'. I have also seen .ast, .asm, .asp, .nsf, .exe,
.phtml, and of course .pl, .cgi and even .tcl. TO add to the problems,
some people configure Apache to do server-side parsing on .html files. One
general rule of thumb is 'anything except a widely known file extension in
the standard mime.conf file plus some common others (.mpg, .mov, .fli,
.avi, .wav, .mp2, .mp3, .png, .htm, .pdf, .java, .class) is probably being
generated dynamically'. 

> I can send "Vary: set-cookie", but this is not enough. There'll be other
> cookies. On a really comercial site there'll be one cookie for each user.

Yep.

-- 
Benjamin Franz

Received on Tuesday, 3 December 1996 05:32:55 UTC