- From: Koen Holtman <koen@win.tue.nl>
- Date: Sun, 18 Aug 1996 00:01:32 +0200 (MET DST)
- To: Jeffrey Mogul <mogul@pa.dec.com>
- Cc: koen@win.tue.nl, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Jeffrey Mogul: > >If I understand your argument, it is that in order to bound the >size of the error in the hit count to lie within a reasonable >range, the max-uses setting would have to be so small that it >would effectively disable caching. Yes. For uncooperative caches. [Koen Holtman:] > I'd like to see *actual statistics* disprove my argument >So I got a day's worth of log entries from our proxy. Here are >some statistics: > > 589705 total log entries > 529756 after removing non-HTTP URLs with "?", "cgi", or "htbin" > 245481 unique "cachable" URLs > 189723 "cachable" URLs referenced only once during the trace > 55758 "cachable" URLs referenced more than once It's very tricky to extrapolate from a day's worth of log entries: to do these statistics right, you would have to count over the lifetime of a cache entry, which is presumably a lot longer than 1 day for your cache. I find it difficult to guess in what direction your end results would change if you calculate over log entry lifetimes. >That's an effective cache hit rate of about 23%, not counting >things that can't be cached, and ignoring any misses that were >caused by modifications to the resources. Eek! I would calculate a ( 529756 - 245481 ) / 529756 * 100% = 54% hit rate for your figures, also ignoring misses due to modification (including the semi-modification known as cache busting!). What is your definition of hit rate? >Supposing that, for each of the "cachable" URLs referenced more than >once, the origin server sent max-uses=3. [...] >220936 uses would not have to be forwarded to the origin server, >which comes out to about 37% of all the references logged. So if cache busting is replaced by max-uses=3, you expect a 37% cache hit rate (i.e. RTT savings in 37% of all cases) in an uncooperative cache, where it earlier had a 0% hit rate for the offending server There are several factors to pollute this figure: 1 day sample, not factoring out dynamic and authenticated content which is uncachable, not counting the 8th, 12th, ... hits, but let's forget about those. >Now, it's quite true that not every server insists on demographics >information, and so the actual number of references saved would >presumably be lower. But this should give some idea of the >magnitude of the possible savings, and I don't think it's insignificant. Your statistics don't answer the main question I have: does max-uses=3 (or max-uses=2 for that matter) give a good enough upper bound to make sites switch from cache busting to max-uses=3? Using figures from your post: 245481 unique "cachable" URLs 228240 of these were referenced 1, 2, or 3 times 17215 were referenced more than 3 times 529756 references on "cachable" URLs 276401 references to URLs referenced 1, 2, or 3 times. 253355 of these references were to URLs referenced more than 3 times We can calculate how good the upper bound is. If we assume optimistically that all references to `more than 3 times' URLs are reported under max-uses=3, we have 228240 + 253355 = 481595 known uses. For the 1,2,3 URLs, the server handed out 3 * 228240 = 684720 uses which never led to any reports. 481595 + 684720 = 11663135. This means that the origin server knows 481595 <= actual uses <= 11663135 . But this upper bound is a factor 2.4 higher, which makes it hardly useful. So max-uses=3 *still* gives you a useless upper bound, and you can't expect that people will switch from using cache busting to using max-uses=3. (Note that the real actual uses, 529756 uses, are only a factor 1.1 higher than the 481595 reported, but this good figure is caused for a large part by the optimistic assumption that all uses of `more than 3 times' URLs are reported.) Now, to do all of the above statistics _right_, you would have to have figures on how many times the contents of a cache slot are served during the lifetime of the cache slot. Unfortunately, I don't know of any data set with these figures. But I feel safe in saying that we can forget about the uncooperative cache option. It won't work, and should be removed from the draft to make it shorter. >-Jeff Koen.
Received on Saturday, 17 August 1996 15:04:48 UTC