Re: When to make objects uncacheable ?

I just joined the ircache mail list, and thought I might have some
input.  I have been analyzing proxy logs to size a replacement proxy
cache.  Background - Fidelity Investments firewall proxy cache,
handling 500,000 access (5 GB) per day, 4000 active users.

The CGI URLS at my proxy make up 11% of all accesses.  In terms of
unique URLs, CGI URLs make up 15%.  Of these 56% are accessed more
than once within a week.   And the average number of times these URLs
are accessed is 1.9.  This means that if I were to cache CGI URLs, I
might be able to get a 48% hit rate.

What is more interesting is that only 16% of the CGI URLs are accessed
multiple times and might benefit from caching.  But these URLs are
each accessed an average of 12 times within a week.  Scanning the raw
data, I see that the largest number of these is for stock quote
lookups, where data more than 10 minutes old might be useless.  And
I responding to a question related to search engines - :(

Here's more data if anyone's interested - the raw numbers from a
week's access logs:

Total accesses (including client cache hits and failures)  3,586,096
Total successful transfers                                 2,502,142
Number of unique URLs                                        949,613
Number of unique URLs repeated                               234,906
Total successful CGI transfers                               274,809
Number of unique CGI URLs                                    143,543
Number of unique CGI URLs repeated                            23,033

Total bytes of data transferred                          25380996328
Total bytes of data transferred once only                10225694349
Total bytes of unique data transferred                   13097062787
Total bytes from repeated URLs                           12283933541
Total bytes from repeated CGI URLs                         988160134

Average transfer size                                          10143
Average CGI transfer                                            6404


Martin Hamilton wrote
>Andrew Daviel writes:
>| Supposing the above premises are reasonable, my question is whether
>| the output of more general search engines should be made cacheable?
>| It seems to me that certain queries are fairly popular, and that
>| some benefit might be had from cacheing the responses. On the other
>| hand, the number of possible URLs expands exponentially with the
>| length of the query string, so that cacheing every response would be
>| unreasonable, filling up caches with never-to-be-repeated requests.
>
>From a quick scan through the logs, I don't see our proxies repeatedly
>servicing the same queries of the same search engines.  Does anyone
>else ?  Our users seem to be coming up with quite sophisticated
>queries, presumably because of the difficulty of actually finding
>anything on the Web...

Chris Hull
(The opinions expressed are my own)

Received on Monday, 19 August 1996 09:20:48 UTC