estimated-visits-per-day (Was: Re: Caching dynamically generated documents)

Daniel DuBois:
>[Luigi Rizzo:]
>>       "http://www.sine.com/sine?angle=0.01",
>>       "http://www.sine.com/sine?angle=0.02",
>>       "http://www.sine.com/sine?angle=0.03",
>>       ...
>>although the replies are small I certainly wouldn't want to store many
>>of them on my cache. And storing just a few of them may be completely
>>pointless.
>
>Well, in a case this bad, maybe a nice server would always send
>Cache-Control: no-cache.  Certainly if a proxy did save these, they wouldn't
>be re-accessed often, and would fall out of the cache's working set.

I have been thinking some time about the idea of nice servers saying

 `this content is cachable, but the chances of it being re-requested
 are very small, so I would not cache it if I were you'

to user agent and proxy caches, to help them make better decisions on
how to use their scarce disk space.

One way to implement this idea would be to introduce a
cache-control header

  Cache-control: estimated-visits-per-day=N

with N being an estimate by the server of how often the resource is
visited each day.  Even very rough estimates (say 0.01, 0.1, 1, 10,
100, or 1000) could help caches a lot.

Estimates could be made by content authors, or semi-automatically by
access log statistics tools (counting GETs, and, especially,
conditional GETs).

Such a scheme may allow 500 Mb caches to get hit rates that are now
only possible with multi-gigabyte caches.  On the other hand, having
estimated-visits-per-day=N may not make much of a difference at all.

I wonder, has anybody ever tried to predict the performance of such
schemes based on proxy logs (logs of outgoing requests)?

>Dan DuBois, Software Animal             http://www.spyglass.com/~ddubois/

Koen.

Received on Friday, 5 January 1996 23:32:30 UTC