Re: Hit-metering: to Proposed Standard? from Koen Holtman on 1996-11-22 (ietf-http-wg@w3.org from October to December 1996)

From: Koen Holtman <koen@win.tue.nl>
Date: Fri, 22 Nov 1996 01:02:06 +0100 (MET)
To: Paul Leach <paulle@microsoft.com>
Cc: mogul@pa.dec.com, fielding@kleber.ICS.UCI.EDU, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199611220002.BAA03051@wsooti04.win.tue.nl>

Paul Leach:
>
>>From:  Roy T. Fielding[SMTP:fielding@kleber.ICS.UCI.EDU]
[...]
>>The other harm I mentioned is the implicit suggestion that
>>"hit-metering" should be sanctioned by the IETF.  It should not.
>>Hit metering is a way for people who don't understand statistical
>>sampling to bog down all requests instead of just those few requests
>>needed to get a representative sample.

To add a note: These are my thoughts on hit metering too (though I
could not have expressed them so eloquently as Roy did).  I feel that
the IETF should not sanction hit metering _unless_ it can be shown
that not doing so will lead to an internet meltdown.

Those sites who don't understand statistics, and who want to double
their income by having a mechanism that will double their directly
measuable hits, will find a hit doubling mechanism whether it is
sanctioned by the IETF or not.

Those sites who want to report better statistics can be helped in much
cheaper ways.

[...]
>Or are you implicitly proposing
>        Cache-control: proxy-revalidate;stale-probability=.01
>(where the new directive "stale-probability=.01" (spelled however you
>like) means that the cache should make an entry be stale with
>probablility .01 at each access;

It is interesting that you bring this up.  I have been playing with an
idea like this for the past few days.  My idea is that, whenever a
user agent which supports `bogohits' makes a request to its internal
cache, it must, with a 1 in 1000 probability, add the headers

  Cache-control: no-cache
  BogoHit: PQR

to the request.  The Cache-Control header ensures that the request
always propagates to the origin server.  PQR is some characterization
the cause of the request (user clicked on link / loading of inlined
object / reload / request by web robot, etc ).  By counting the
BogoHit headers and multiplying by 1000, the origin server gets an
estimate of the actual hit count.  Well-known statistical formulae
give the accuracy of the obtained number.

This method has a very low overhead, adds no complexity in caches,
gives only a minimal loss of privacy, and measures things at the
actual source.  It thus provides a benchmark/certification for other
statistical methods.  Of course, you have to make a correction for
user agents which don't support the mechanism; you can use user-agent
header statistics for this.  If as a few major user agent vendors
adopt the system, this correction won't add much uncertainty.

Sites who want to get payed by the hit could then in future say
something like: "We charge $0.01 for every hit on our server.  We
calculated 142 clicked links per 1000 hits on our server in the second
quarter of 1998."

Koen.

Received on Thursday, 21 November 1996 16:53:46 UTC