Comments on draft-mogul-http-hit-metering-00.txt from Koen Holtman on 1996-11-26 (ietf-http-wg@w3.org from October to December 1996)

From: Koen Holtman <koen@win.tue.nl>
Date: Tue, 26 Nov 1996 22:53:27 +0100 (MET)
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Cc: Koen Holtman <koen@win.tue.nl>
Message-Id: <199611262153.WAA07369@wsooti04.win.tue.nl>
[Larry, I know you asked us to stop sending comments, but I promised
the authors my comments earlier.  I'll limit my message to things that
have not been said before.]


Section 3.1:

>   When a proxy forwards a hit-metered or usage-limited response to a
>   client (proxy or end-client) not in the metering subtree, it MUST
>   omit the Meter header, and it MUST add "Cache-control:
>   proxy-revalidate" to the response.

I'd rather have the cache leave the Meter header in the request.  With
your system, upstream caches cannot tell a `mission critical'
"Cache-control: proxy-revalidate" added to make a shopping cart
application safe from a `frivolous' "Cache-control: proxy-revalidate"
added to get hit counts.

I think it there must be a way for proxies to tell the two uses of
proxy-revalidate apart, else proxy cache operators who want to have
nothing to do with hit counting will be tempted to ignore not only the
the `frivolous' but also the `mission critical' "Cache-control:
proxy-revalidate" headers.

Stated differently: your current system will encourage some people to
ignore all "Cache-control: proxy-revalidate" headers, and this is a
bad thing, because it will make the web an unsafe place.


Section 3.5:

>      2. When it forwards a conditional HEAD on the resource
>         instance on behalf of one of its clients.

HTTP/1.1 does not define conditional HEADs, you will have to define
them yourself.


Section 4/4.1:

>   We believe that our
>   design provides adequate support for user-counting, based on the
>   following analysis.

I don't agree with your analysis.  I think that the difference between
your `hit counts' and real `user counts' will be both significant and
hard to measure, because of

 - differing sizes of user agent caches
 - the same user revisiting a page after a long time
 - user agents being shared by multiple users, as in web kiosks
 - web robot activity not being distinguished from user activity
 - user agents with a cache size of 0 (i.e. network computer)

Your `network computers' already exist: On our sun cluster, which has
a central proxy, my user agent disk cache size is set to 0.  A agent
cache would just waste user filesystem space, and our user filesystems
are always 97-100% full.

I think your system is very good at replicating the functionality
offered by hit counting based on cache busting.  But _hit_ counts based
on cache busting are not very good _user_ counts.

I don't think any system can get you good user counts unless it
includes per-user global browsing history logs of many megabytes.

Now, on a higher level:

You want to eliminate cache busting by introducing a cheaper way of
measuring all hits.  We could also try to eliminate cache busting by
eliminating the need to measure hits at all.  Suppose we could
convince web advertisers that they ought to stop paying for hits and
start paying for _clicks_, with clicks a newly defined measure which
has a much better correlation to actual ad exposure.

In an earlier message, I outlined a cheap `BogoHits' system which
would get you good _click_ counts, where a one click was defined as
one mouse click on a page link by an actual end user.

_If_ statistics say that cache busting is done mainly for hit counts,
then the click count option may be an attractive alternative to your
hit count mechanism.


Section 4.2:

>Why max-uses is not a Cache-control directive

I think your reasoning is flawed here: the Meter header can be used to
negotiate on the honoring of max-uses no matter whether max-uses
appears in the Cache-Control response header or in the Meter response
header.


Section 5.3.1:

I believe your system will always count a page _pre_fetch done by a
proxy as a hit, no matter whether the page is actually seen by a user
later.  You need to fix this.


In general:

I think the draft would be improved if you remove usage limiting:
usage limiting is not necessary anymore if all counting proxies will
always report hits eventually.  You could replace usage limiting with
a max-time-to-wait-with-report mechanism, which would be easier to
define and implement.

I do not like the special treatment of varying resources, because of
privacy, efficiency, and complexity reasons.  If negotiation is done
with TCN, you will get good variant counts without  all this
complexity, because each variant has its own URL in TCN.

Even if cache busting turns out to be done mainly to get hit counts,
one could still make a case against your proposal.  If we don't
optimize unwanted origin server behavior, the unwanted behavior will
disappear by itself eventually, because users will vote with their
mouse-button and move on to faster sites.

Koen.
Received on Tuesday, 26 November 1996 14:01:30 UTC