Re: New document on "Simple hit-metering for HTTP" from Koen Holtman on 1996-08-07 (ietf-http-wg@w3.org from July to September 1996)

From: Koen Holtman <koen@win.tue.nl>
Date: Wed, 7 Aug 1996 14:54:07 +0200 (MET DST)
To: Paul Leach <paulle@microsoft.com>
Cc: koen@win.tue.nl, mogul@pa.dec.com, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199608071254.OAA12164@wsooti04.win.tue.nl>

Koen:
>>>>In the current (classic) meaning of the word,
>>>>
>>>>  1 hit-classic = 1 request on an origin server.

Paul:
>Actually, it just occurred to me -- you're definition of "hit-classic"
>is bogus. We (at least I) do not know how sites count hits.

You may not know that, but I am pretty sure that when people say `my
server gets 10K hits per day', they mean request messages per day.

Maybe some statistics packages use other definitions of `hits', but I
am sure that cache-busting advertising sites who get payed by the hit
use this definition.  This definition yields the highest count, and
high counts is what they want.  Even an image transfer aborted at 10%
would be a hit; it yields a line in the access log file, after all.
(Which reminds me: you probably have to add something to your draft
about how to count aborted transfers).

Koen:
>>Look, if all hit count customers were sophisticated enough to pay more
>>for `high quality' hits, advertising sites could not make more money
>>by using cache busting.  So if your assumed level customer of maturity
>>was indeed present, there would not be a cache busting problem to
>>solve in the first place.

Paul:
>Huh? Until we deploy something in caches, "cache-busting" is the only
>way they have to get demographic data. 

Nope. You can collect demographic data fine without cache busting; you
just get less of it.  There will still be plenty of stuff in your
logfile.

Also, you really need to distinguish here between two kinds of
demographic data:

1) Hit counts

2) User's Referer field, IP address, User-Agent field, ...

Your hit counting mechanism can only reduce the cache busting done to
get more of 1), not the cache busting done to get more of 2).  

That is why I focus on by how successful your scheme is at getting
more of 1).  If most advertising sites are actually doing cache
busting to get more of 2), (and they might, I don't know) then your
proposal will not reduce cache busting anyway.

[...]
>>I guess we could argue at length about present and predicted levels of
>>maturity (both in the US and in Europe), but the bottom line is this:
>>
>> I feel very strongly that it would be a huge mistake to make a scheme
>> to eliminate cache busting dependent on something, sufficient
>> customer sophistication in this case, the existence of which is
>> questionable at best.
>
>First, this isn't a scheme to "eliminate" cache busting, it is only
>trying to reduce it.
>
>Second, really unsophisticated customers just won't use it -- they'll
>continue cache-busting.

You don't understand my argument: the customers above are the ones who
pay for each the hit.  They are not the ones who do the cache busting:
it is the web advertising sites they pay who do the cache busting.
These advertising sites are sophisticated enough to pick and deploy
the mechanism which gets them the highest hit counts, which is cache
busting until something better comes along.

[...]
>>I'm primarily worried about the filesystem read/write overhead needed
>>to maintain the counters.  I would like to hear an implementer say
>>that this is not a problem.
>
>Mine say it isn't.

OK, that is good enough for me.


Koen.

Received on Wednesday, 7 August 1996 06:06:36 UTC