Re: Comments on draft-ietf-http-hit-metering-00.txt from Jeffrey Mogul on 1997-02-18 (ietf-http-wg@w3.org from January to March 1997)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Tue, 18 Feb 97 15:20:20 PST
To: Koen Holtman <koen@win.tue.nl>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9702182320.AA25591@acetes.pa.dec.com>
    On a micro level: I listed some (fixable) technical problems with the
    previous draft in
    
     http://www.ics.uci.edu/pub/ietf/http/hypermail/1996q4/0294.html
    
    As far as I can see, most of these problems have not been fixed in the
    new draft.  Did the message above somehow drop out of the author's
    editorial queue?

It is, in fact, still in my in-box, so I'll reply to that in my
next message.  Whether I would define these as "(fixable) technical
problems" or "differences of opinion" is another question :-)
    
    1) Section 4 says: `We believe that our design provides adequate
    support for user-counting, based on the following analysis.'  I do not
    think it does (for a longer discussion, see the article linked above),
    and as long as this claim stays in, I won't support the draft.  
    
    Many people want web metrics better than what have now, but this draft
    does not provide such metrics.  Vendors say they get pressure for
    better metrics from their customers; I don't think implementing this
    draft will make the pressure go away.

We can quibble about whether the design does indeed provide
adequate support for counting users of a page.  Perhaps the
right statement would be
   We believe that our design provides adequate support for
   user-counting, within the constraints of what is feasible in the
   current Internet, based on the following analysis.
As you stated yourself in the message cited above,
	I don't think any system can get you good user counts unless it
	includes per-user global browsing history logs of many megabytes.
If by "good" you mean "100.00% accurate", then I'm inclined to agree
with you, and since this is clearly infeasible, one has to settle
for what is possible.  We prefer to define "adequate" as "at least as
accurate as is currently possible", given that cache-busting leads
to over-counting in many cases.

    2) I feel that there is too much unnecessary cruft in the draft.  The
    usage limiting stuff should be removed, and the special rules for
    varying resources should probably also be removed.  
    
Some people seem to prefer hit-counting over usage-limiting; some
people prefer the opposite.  There is no clear consensus that one
obviates the other.  Since both seem (to us) to be best served by
slight variations of a single basic mechanism, we believe that it
is appropriate to include both in the proposal.

As for section 8, "Interactions with varying resources": this simply
states the bare minimum necessary to make sensible use of the Vary
mechanism as it is currently defined in the HTTP/1.1 RFC.

    3) To quote Roy Fielding:
    
       The other harm I mentioned is the implicit suggestion that
       "hit-metering" should be sanctioned by the IETF.  It should
       not.  Hit metering is a way for people who don't understand
       statistical sampling to bog down all requests instead of just
       those few requests needed to get a representative sample.
       Whether or not some ISP customers want it does not change the
       fact that it is damaging to the community as a whole, and it's a
       lot better to inform people on how not to be a "scum sucking
       pig" than it is to have a proposed standard on slightly-less
       piggish ways to be a pig.
    
    I feel that the IETF should not sanction this form of hit metering
    (by making it a proposed standard) _unless_ it can be shown that not
    doing so will lead to an internet meltdown.  
    
Since neither you nor Roy attended the San Jose session in February,
and (although I supplied them in machine-readable form) the slides
I presented there have not been posted as part of the minutes, I will
quote from them here:

     o Cons (real or alleged) [of our proposal]
       - Slight overhead on the wire
	  * This either pays off, or people won't use it
       - Some storage overhead
       - May reduce pressure on service authors to adopt more complex 
	    proposals
       - May not provide enough information to attract wide use
     o Last two "cons" cannot both be true!

To be specific, in this message, you yourself have stated
	"Many people want web metrics better than what have now,
	but this draft does not provide such metrics."
and
       "if the draft is adopted, some people who will do cache busting
       now will switch to the hit counting methods in the draft."
You simply can't have it both ways.  Either the draft is useful
to a significant number of people, or it is not.

I wouldn't waste the WG's time discussing proposals about "statistical
sampling" until such time as we have seen a specific proposal.
    
    Also, adopting the draft may slow the introduction of a better
    demographics system later.  A better system does not necessarily
    have to be based on HTTP extensions either.  However, I have no
    high hopes of a better system happening very soon, or at all.  The
    social issues that need to be resolved are even more complex than
    the technical issues.

Pending a specific proposal (in the form of an Internet-Draft?) for a
"better system", it's pointless to discuss whether adopting our draft
would slow the introduction of something better.  Especially since
you and I seem to agree that a "better system" isn't likely soon.

-Jeff
Received on Tuesday, 18 February 1997 15:38:30 UTC