- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Wed, 14 Aug 96 16:50:25 MDT
- To: Koen Holtman <koen@win.tue.nl>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
However, you have handed out 80*10=1000 uses, which gives you 800 hits
as the upper bound. So all you know is:
80 <= actual hits <= 800
This is not what I call useful information. Something like an
interesting upper bound would be
80 <= actual hits <= 100
but I see no way in which max-uses can provide such a bound.
I suspect that max-uses counts higher than 3 will be disastrously
ineffective at yielding a useful upper bound if uncooperative caches
are common.
A proxy not being cooperative and only supporting max-uses seems about
as bad as a proxy not supporting hit counts at all.
If I understand your argument, it is that in order to bound the
size of the error in the hit count to lie within a reasonable
range, the max-uses setting would have to be so small that it
would effectively disable caching.
I'd like to see *actual statistics* disprove my argument
So I got a day's worth of log entries from our proxy. Here are
some statistics:
589705 total log entries
529756 after removing non-HTTP URLs with "?", "cgi", or "htbin"
245481 unique "cachable" URLs
189723 "cachable" URLs referenced only once during the trace
55758 "cachable" URLs referenced more than once
That's an effective cache hit rate of about 23%, not counting
things that can't be cached, and ignoring any misses that were
caused by modifications to the resources.
Supposing that, for each of the "cachable" URLs referenced more than
once, the origin server sent max-uses=3.
Of the
55758 "cachable" URLs referenced more than once
28951 (52%) were referenced exactly twice
9592 (17%) were referenced exactly 3 times
Or in other words, of the
340033 references to "cachable" URLs referenced more than once
28951*2 + 9592*3 = 86678 of these references were to URLs
referenced 2 or 3 times
so
340033 - 86678 = 253355 of these references were to URLs
referenced more than 3 times
Now, assume that the servers had all sent max-uses=3 for these
URLs. Then the first use of each of these URLs (55758 uses)
plus every 4th use of each of the URLs referenced more than
3 times (roughly 253355/4 = 63339 uses) would have to be forwarded
to the origin server. This means that 340033 - (63339 + 55758)
220936 uses would not have to be forwarded to the origin server,
which comes out to about 37% of all the references logged.
Now, it's quite true that not every server insists on demographics
information, and so the actual number of references saved would
presumably be lower. But this should give some idea of the
magnitude of the possible savings, and I don't think it's insignificant.
-Jeff
Received on Wednesday, 14 August 1996 17:01:02 UTC