Re: New document on "Simple hit-metering for HTTP" from Koen Holtman on 1996-08-06 (ietf-http-wg@w3.org from July to September 1996)

From: Koen Holtman <koen@win.tue.nl>
Date: Tue, 6 Aug 1996 17:00:06 +0200 (MET DST)
To: Paul Leach <paulle@microsoft.com>
Cc: mogul@pa.dec.com, koen@win.tue.nl, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199608061500.RAA11073@wsooti04.win.tue.nl>
Paul Leach:
>[Koen Holtman:]
>>
>>I have several comments.
>>
>>1. Cascaded proxy caches.
>>
>>At first glance, there seem to be counting problems in a cascaded
>>proxy cache situation.  If we have the arrangement
>>
>>   origin server ---- proxy 1 ------ proxy 2 ---- user agent
>>
>>and the user agent requests and uncached page,
[....]
>
>Sounds like a bug. The fix sounds yucky, though. WE'll try to dream up a
>cleaner one.

I thought up another fix some hours after I sent my message: adopt the
rule that a server which is only relaying, not generating, a response
must not count it.  In the above example, the origin server itself
would count the hit, not any upstream proxy cache.

This way, when serving a request the origin server can immediately add
1 to its total hit count.  With my earlier fix, it would have to wait
for proxy 2 to report the hit in a future request.


>>3. A `hit' being an *un*conditional GET
>>
>>In the current (classic) meaning of the word,
>>
>>  1 hit-classic = 1 request on an origin server.
>>
>>Your draft defines a new kind of hit:
>>
>>  1 hit-new = 1 200/203/206 response returned to a user agent.
>>
>>Now, if I am an origin server which uses cache busting, and if most
>>caches play by the rules, then for my server I will measure:
>>
>>  hit-new < hit-classic .
>>
>>Assuming that I get payed by the hit, I have absolutely no incentive
>>to start measuring hit-news instead of hit-classics.  To stop using
>>the cache-busting based hit-classics would be economic suicide.
>
>No, the  payment per hit-new would just be higher than for per
>hit-classic.

Your answer assumes a level of maturity in the payer which simply is
not present yet.  This market is far too young, and far too dominated
by customers who respond to the `your company will be obsolete if you
do not get on the web *now*' hype.

Look, if all hit count customers were sophisticated enough to pay more
for `high quality' hits, advertising sites could not make more money
by using cache busting.  So if your assumed level customer of maturity
was indeed present, there would not be a cache busting problem to
solve in the first place.

I can't see the level of maturity you assume being present now
anywhere except maybe in a few very big web advertisers.

For the companies who responded to the `your company will be obsolete
if you do not get a home page *now*' hype by getting some noname
startup to create and maintain a homepage (usually containing at least
200Kb of images) for them, this level of maturity will not appear for
some time.

I guess we could argue at length about present and predicted levels of
maturity (both in the US and in Europe), but the bottom line is this:

 I feel very strongly that it would be a huge mistake to make a scheme
 to eliminate cache busting dependent on something, sufficient
 customer sophistication in this case, the existence of which is
 questionable at best.

This dependence on customer sophistication can easily be removed by
counting hits in a way which gives *higher* counts even to sites which
now use cache-busting.  By generating a higher counts even for these
sites, the scheme will work no matter how (un)sophisticated the
customers are.

Again, there is no need to count only one thing.  You could count both
the `meaningful' type of hits you have defined, and my `inflated'
hits.  That would even let you measure cache efficiency in a neat way.

>>4. Interaction with Vary
>>
>>I don't like the extra complexity and inefficiency introduced by the
>>Vary counting rules in section 3. (See second-to-last sentence of
>>Section 5.1.)
>>
>>I think the proposal would be better if the Vary special case were
>>removed entirely.
>
>Don't you think that providers of multilingual content want to count the
>hits of the French, German, Dutch, and English (etc.) version
>separately?

Sure!  There will also be providers who want the referer info, the
e-mail addres, and a credit rating of the user for every hit on a
page.

The question here is where to strike the balance between
simplicity/efficiency and the need for statistics.

I feel that your vary scheme adds to much complexity.  Making it
Etag-based would help, but it would still be a bit too complex for my
taste.  I guess the WG as a whole will have to decide on where to
strike the balance.

Also, multilingual pages which use transparent content negotiation
will not need a Vary-based hit counting scheme: you can quite
naturally count hits on the variant URLs.

[...]
>>5. Overhead in proxy efficiency
>>
>>I'm wondering if the counting mechanisms in the draft won't cause an
>>unacceptable overhead for high-performance cache implementations.  I
>>think we definitely need the opinions of proxy cache implementers on
>>this issue.
>
>The design is supposed to just require addition of a counter to a data
>structure that needs to be around anyway, and to add a few bytes to a
>message you needed to send anyway.

I'm primarily worried about the filesystem read/write overhead needed
to maintain the counters.  I would like to hear an implementer say
that this is not a problem.

>  The alternative is that pages for
>which demographic info is required aren't cached at all, which is
>plainly MUCH worse.

No, it is not necessarily worse.  It depends on how much cache-busting
there is now, and on how much your proposal would eliminate.  We are
comparing an overhead for *every* page served with an overhead due to
extra *conditional* requests for *some* pages.  This all boils down to
the last comment in my previous message: how much cache busting is
there anyway?

>Paul 

Koen.
Received on Tuesday, 6 August 1996 08:05:46 UTC