- From: Hull, Chris <Chris.Hull@fmr.com>
- Date: Tue, 20 Aug 1996 10:56 -0400 (EDT)
- To: Tai Jin <tai@hplb.hpl.hp.com>
- Cc: "http-wg@cuckoo.hpl.hp.com" <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>, "ircache@nlanr.net" <ircache@nlanr.net>
Tai Jin wrote: >> The CGI URLS at my proxy make up 11% of all accesses. In terms of >> unique URLs, CGI URLs make up 15%. Of these 56% are accessed more >> >> ... >> >> Average CGI transfer 6404 >I'd be more interested in increasing the hit rate on cacheable URLs. >I can't discern the hit rate from your data, but if you're getting a >40% hit rate then, sure, you can try to squeeze the remaining 5% (48% >of 11%) out of it. Well, actually the current hit rate is zero. Due to sizing problems we have had to turn off caching a few weeks ago. That's why I found out about this list, and also why I wrote some code to analyze 600MB of log files. I'm looking to implement a multi-level cache, distributed over three sites, but today I am just collecting data to model the sizing issues. To answer another recent post, I am looking at implementing a central proxy cache of ~30 GB, using Netscape 2.0 (or 2.1 which Netscape claims will be performance-enhanced). We will start testing in a few weeks. >Here are my cache stats (for a small workgroup, data in megabytes) - > >Total cacheable URLs: 64194/90.51 >Total cacheable data: 427.2/91.98 >Unique cacheable URLs: 29799/46.42/42.01 >Unique cacheable data: 301.2/70.50/64.85 >URLs accessed only once: 23138/77.65/36.04/32.62 >Data accessed only once: 242.2/80.41/56.69/52.15 >Unique non-cacheable URLs: 1204/17.88/ 1.70 >Unique non-cacheable data: 6.5/17.37/ 1.39 > >I have similar numbers in terms of cacheable (91%) and non-cacheable >(9%) URLs. The percentage of URLs accessed only once is relatively >high: 78% of unique cacheable URLs, 36% of total cacheable URLs, or >33% of total URLs. And the percentage of data accessed only once is >even higher: 80% of unique cacheable, 57% of total cacheable, or 52% >of total data. > >HIT/freq: 10721/16.76 96.4/20.76 >MISS/freq: 33777/52.81 >EXPIRED/freq: 697/ 1.09 >REFRESH/freq: 3196/ 5.00 >IMS/freq: 15564/24.34 >ERR/freq: 239/ 0.37 > >The percentage of hits is relatively small (17% of requests and 21% of >data) and I'd like to increase this. But it looks like the best I can >hope for is about 40% of total data volume (+ 52% accessed once + 8% >non-cacheable = 100%). Has anyone been able to do better than 40%? >I'm wondering if that's the practical limit. That's what I'm seeing as well. The best I could hope for would be 37% of the URLs, which would account for 48% of the data. And this assumes that none of the pages have expired (which I can't easily see from the logs) within the week. What I did notice that as the number of logs analyzed increase, the potential hit rates did get better. When I looked at one day, I calculated that I should only be able to get a 28% hit rate(URLs). With two days data in cache, I should be able to get a 33% hit rate. One week - 37%. The relationship is not linear with respect to time, and I expect to see diminishing returns, but I imagine that if the cache is large enough to store all accesses for a month, the hit rate would increase even higher. I could try to analyze a month's worth of access logs to calculate the potential hit rates, but my current program isn't up to the task. However, the overhead of managing the cache increases with the size. Or is it the square of the size? Chris
Received on Tuesday, 20 August 1996 09:05:00 UTC