- From: Koen Holtman <koen@win.tue.nl>
- Date: Fri, 22 Nov 1996 01:02:06 +0100 (MET)
- To: Paul Leach <paulle@microsoft.com>
- Cc: mogul@pa.dec.com, fielding@kleber.ICS.UCI.EDU, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Paul Leach: > >>From: Roy T. Fielding[SMTP:fielding@kleber.ICS.UCI.EDU] [...] >>The other harm I mentioned is the implicit suggestion that >>"hit-metering" should be sanctioned by the IETF. It should not. >>Hit metering is a way for people who don't understand statistical >>sampling to bog down all requests instead of just those few requests >>needed to get a representative sample. To add a note: These are my thoughts on hit metering too (though I could not have expressed them so eloquently as Roy did). I feel that the IETF should not sanction hit metering _unless_ it can be shown that not doing so will lead to an internet meltdown. Those sites who don't understand statistics, and who want to double their income by having a mechanism that will double their directly measuable hits, will find a hit doubling mechanism whether it is sanctioned by the IETF or not. Those sites who want to report better statistics can be helped in much cheaper ways. [...] >Or are you implicitly proposing > Cache-control: proxy-revalidate;stale-probability=.01 >(where the new directive "stale-probability=.01" (spelled however you >like) means that the cache should make an entry be stale with >probablility .01 at each access; It is interesting that you bring this up. I have been playing with an idea like this for the past few days. My idea is that, whenever a user agent which supports `bogohits' makes a request to its internal cache, it must, with a 1 in 1000 probability, add the headers Cache-control: no-cache BogoHit: PQR to the request. The Cache-Control header ensures that the request always propagates to the origin server. PQR is some characterization the cause of the request (user clicked on link / loading of inlined object / reload / request by web robot, etc ). By counting the BogoHit headers and multiplying by 1000, the origin server gets an estimate of the actual hit count. Well-known statistical formulae give the accuracy of the obtained number. This method has a very low overhead, adds no complexity in caches, gives only a minimal loss of privacy, and measures things at the actual source. It thus provides a benchmark/certification for other statistical methods. Of course, you have to make a correction for user agents which don't support the mechanism; you can use user-agent header statistics for this. If as a few major user agent vendors adopt the system, this correction won't add much uncertainty. Sites who want to get payed by the hit could then in future say something like: "We charge $0.01 for every hit on our server. We calculated 142 clicked links per 1000 hits on our server in the second quarter of 1998." Koen.
Received on Thursday, 21 November 1996 16:53:46 UTC