Comments on draft-ietf-http-hit-metering-00.txt from Koen Holtman on 1997-02-16 (ietf-http-wg@w3.org from January to March 1997)

From: Koen Holtman <koen@win.tue.nl>
Date: Sun, 16 Feb 1997 20:52:10 +0100 (MET)
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Cc: Koen Holtman <koen@win.tue.nl>
Message-Id: <199702161952.UAA27905@wsooti08.win.tue.nl>
I just read the new hit metering draft, and spent some time reading
through the archived discussions on previous metering drafts.

I like the new material that was added in this draft.  However, I note that
the basic mechanism has not changed, nor have the claims made about it.  So
the problems I had with the previous draft still exist.

On a micro level: I listed some (fixable) technical problems with the
previous draft in

 http://www.ics.uci.edu/pub/ietf/http/hypermail/1996q4/0294.html

As far as I can see, most of these problems have not been fixed in the
new draft.  Did the message above somehow drop out of the author's
editorial queue?

On a macro level: 

1) Section 4 says: `We believe that our design provides adequate
support for user-counting, based on the following analysis.'  I do not
think it does (for a longer discussion, see the article linked above),
and as long as this claim stays in, I won't support the draft.  

Many people want web metrics better than what have now, but this draft
does not provide such metrics.  Vendors say they get pressure for
better metrics from their customers; I don't think implementing this
draft will make the pressure go away.  

2) I feel that there is too much unnecessary cruft in the draft.  The
usage limiting stuff should be removed, and the special rules for
varying resources should probably also be removed.  

The stickiness and header compression (header abbreviation) should
largely be cut -- this stuff just generates a lot of code, and the
efficiency savings in no way compensate for the efficiency loss due to
the extra requests and the cache busting outside of the metering
subtree.  We'll have stickiness and header compression as a general
mechanism in HTTP/2.0, or http-ng, or whatever.  I see no reason to
introduce this stuff for some specialised header beforehand.


3) To quote Roy Fielding:

>The other harm I mentioned is the implicit suggestion that "hit-metering"
>should be sanctioned by the IETF.  It should not.  Hit metering is a way for
>people who don't understand statistical sampling to bog down all requests
>instead of just those few requests needed to get a representative sample.
>Whether or not some ISP customers want it does not change the fact that
>it is damaging to the community as a whole, and it's a lot better to inform
>people on how not to be a "scum sucking pig" than it is to have a proposed
>standard on slightly-less piggish ways to be a pig.

I feel that the IETF should not sanction this form of hit metering
(by making it a proposed standard) _unless_ it can be shown that not
doing so will lead to an internet meltdown.  

I don't think this has been shown, and I think that the evidence so
far is actually to the contrary.  I read through a lot of discussions
about this in the archives.  To summarise:

 - for this discussion, cache busting means making the user agent
   do a conditional get every time, after which the server usually
   sends a 304 (not modified).

 - estimated cache busting levels are ~30% - 0.0001%
   (also depends on whether you count unintentional cache busting)

 - other reasons for cache busting include
    - stupidity / laziness / inertia  (CGI's and server side includes
        both lead to cache busting in the default case)
    - working around broken browser features
    - sites which require statefulness/authentication
    - showing a different ad each time
    - gathering hit count demographics
    - gathering demographics better than just hits (though the draft
       does have a mechanism for gathering more than hits, the cache
       efficiency of this mechanism is not much better than using
       plain cache busting in my assessment)

 - it is unknown how large a fraction of cache busting is done only
   to get hit counts.   

 - it is unknown whether the people doing cache busting to get hit
   count demographics now can be educated to use friendlier
   statistical methods (the authors of the draft seem to assume that
   not many can be)

 - if the draft is adopted, some people who will do cache busting now
   will switch to the hit counting methods in the draft.  However,
   others who don't count anything now may start using the draft, and
   this leads to _more_ cache busting outside of the metering subtree.

Due to this last point, we don't even know if the overall effect of
implementing the draft will be good or bad!  

Also, adopting the draft may slow the introduction of a better
demographics system later.  A better system does not necessarily have
to be based on HTTP extensions either.  However, I have no high hopes
of a better system happening very soon, or at all.  The social issues
that need to be resolved are even more complex than the technical
issues.

In summary: I don't support this draft going to proposed standard.  I
_might_ support it as an experimental RFC if 1) and 2) above are
resolved.

Koen.
Received on Sunday, 16 February 1997 12:01:53 UTC