Re: When to make objects uncacheable ?

Tai Jin wrote:
>> The CGI URLS at my proxy make up 11% of all accesses.  In terms of
>> unique URLs, CGI URLs make up 15%.  Of these 56% are accessed more
>>
>>  ...
>>
>> Average CGI transfer                                        6404

>I'd be more interested in increasing the hit rate on cacheable URLs.
>I can't discern the hit rate from your data, but if you're getting a
>40% hit rate then, sure, you can try to squeeze the remaining 5% (48%
>of 11%) out of it.

Well, actually the current hit rate is zero.  Due to sizing
problems we have had to turn off caching a few weeks ago.  That's
why I found out about this list, and also why I wrote some code to
analyze 600MB of log files.  I'm looking to implement a multi-level
cache, distributed over three sites, but today I am just collecting
data to model the sizing issues.

To answer another recent post, I am looking at implementing a central proxy 
cache of ~30 GB, using Netscape 2.0 (or 2.1 which Netscape claims will be 
performance-enhanced).  We will start testing in a few weeks.

>Here are my cache stats (for a small workgroup, data in megabytes) -
>
>Total cacheable URLs:       64194/90.51
>Total cacheable data:       427.2/91.98
>Unique cacheable URLs:      29799/46.42/42.01
>Unique cacheable data:      301.2/70.50/64.85
>URLs accessed only once:    23138/77.65/36.04/32.62
>Data accessed only once:    242.2/80.41/56.69/52.15
>Unique non-cacheable URLs:   1204/17.88/ 1.70
>Unique non-cacheable data:    6.5/17.37/ 1.39
>
>I have similar numbers in terms of cacheable (91%) and non-cacheable
>(9%) URLs.  The percentage of URLs accessed only once is relatively
>high: 78% of unique cacheable URLs, 36% of total cacheable URLs, or
>33% of total URLs.  And the percentage of data accessed only once is
>even higher: 80% of unique cacheable, 57% of total cacheable, or 52%
>of total data.
>
>HIT/freq:       10721/16.76     96.4/20.76
>MISS/freq:      33777/52.81
>EXPIRED/freq:     697/ 1.09
>REFRESH/freq:    3196/ 5.00
>IMS/freq:       15564/24.34
>ERR/freq:         239/ 0.37
>
>The percentage of hits is relatively small (17% of requests and 21% of
>data) and I'd like to increase this.  But it looks like the best I can
>hope for is about 40% of total data volume (+ 52% accessed once + 8%
>non-cacheable = 100%).  Has anyone been able to do better than 40%?
>I'm wondering if that's the practical limit.

That's what I'm seeing as well.  The best I could hope for would be
37% of the URLs, which would account for 48% of the data.  And this
assumes that none of the pages have expired (which I can't easily
see from the logs) within the week.

What I did notice that as the number of logs analyzed increase, the
potential hit rates did get better.  When I looked at one day, I
calculated that I should only be able to get a 28% hit rate(URLs).
With two days data in cache, I should be able to get a 33% hit
rate.  One week - 37%.

The relationship is not linear with respect to time, and I expect
to see diminishing returns, but I imagine that if the cache is
large enough to store all accesses for a month, the hit rate would
increase even higher.  I could try to analyze a month's worth of
access logs to calculate the potential hit rates, but my current
program isn't up to the task.  However, the overhead of managing
the cache increases with the size. Or is it the square of the
size?

Chris

Received on Tuesday, 20 August 1996 09:05:00 UTC