W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > September to December 1996

Re: Hit-metering: to Proposed Standard?

From: Koen Holtman <koen@win.tue.nl>
Date: Fri, 22 Nov 1996 01:02:06 +0100 (MET)
Message-Id: <199611220002.BAA03051@wsooti04.win.tue.nl>
To: Paul Leach <paulle@microsoft.com>
Cc: mogul@pa.dec.com, fielding@kleber.ICS.UCI.EDU, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
X-Mailing-List: <http-wg@cuckoo.hpl.hp.com> archive/latest/1943
Paul Leach:
>>From:  Roy T. Fielding[SMTP:fielding@kleber.ICS.UCI.EDU]
>>The other harm I mentioned is the implicit suggestion that
>>"hit-metering" should be sanctioned by the IETF.  It should not.
>>Hit metering is a way for people who don't understand statistical
>>sampling to bog down all requests instead of just those few requests
>>needed to get a representative sample.

To add a note: These are my thoughts on hit metering too (though I
could not have expressed them so eloquently as Roy did).  I feel that
the IETF should not sanction hit metering _unless_ it can be shown
that not doing so will lead to an internet meltdown.

Those sites who don't understand statistics, and who want to double
their income by having a mechanism that will double their directly
measuable hits, will find a hit doubling mechanism whether it is
sanctioned by the IETF or not.

Those sites who want to report better statistics can be helped in much
cheaper ways.

>Or are you implicitly proposing
>        Cache-control: proxy-revalidate;stale-probability=.01
>(where the new directive "stale-probability=.01" (spelled however you
>like) means that the cache should make an entry be stale with
>probablility .01 at each access;

It is interesting that you bring this up.  I have been playing with an
idea like this for the past few days.  My idea is that, whenever a
user agent which supports `bogohits' makes a request to its internal
cache, it must, with a 1 in 1000 probability, add the headers

  Cache-control: no-cache
  BogoHit: PQR

to the request.  The Cache-Control header ensures that the request
always propagates to the origin server.  PQR is some characterization
the cause of the request (user clicked on link / loading of inlined
object / reload / request by web robot, etc ).  By counting the
BogoHit headers and multiplying by 1000, the origin server gets an
estimate of the actual hit count.  Well-known statistical formulae
give the accuracy of the obtained number.

This method has a very low overhead, adds no complexity in caches,
gives only a minimal loss of privacy, and measures things at the
actual source.  It thus provides a benchmark/certification for other
statistical methods.  Of course, you have to make a correction for
user agents which don't support the mechanism; you can use user-agent
header statistics for this.  If as a few major user agent vendors
adopt the system, this correction won't add much uncertainty.

Sites who want to get payed by the hit could then in future say
something like: "We charge $0.01 for every hit on our server.  We
calculated 142 clicked links per 1000 hits on our server in the second
quarter of 1998."

Received on Thursday, 21 November 1996 16:53:46 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:40:18 UTC