Re: Hit-metering: to Proposed Standard? from Ted Hardie on 1996-11-22 (ietf-http-wg@w3.org from October to December 1996)

From: Ted Hardie <hardie@orval.arc.nasa.gov>
Date: Thu, 21 Nov 1996 16:29:47 -0800 (PST)
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: hardie@nic.nasa.gov, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199611220029.QAA16062@orval.arc.nasa.gov>
Jeffrey Mogul writes:
> Well, I guess we didn't make this clear.  The new mechanism does
> NOT create a duty.  What it does is to allow a proxy and server
> (not necessarily an origin server!) to agree on a connection-by-
> connection basis to enter into a "contract" of sorts.  

I'm finding it difficult to approach this problem without falling into
the language of lawyers, and I suspect others are having similar
problems in distinguishing between the engineering aspects of this
proposal and the aspects which deal with the "'contract' of sorts" you
mention above.  I'll fight the impulse, but please forgive any lapses.

I very much like the idea that your proposal describes a "best effort"
approach, rather than a pure contract, and I appreciate the work
you've done to allow people to opt out of portions that they see as
onerous.  Still, what you are doing is creating a technical method by
which a server can say "Obey me or I won't allow you to serve copies
of my resources".  Your assumption is that current servers which
desire (defendably high, frankly, rather than accurate) hit counts are
cache-busting.  According to your scenario, they will switch to this
method and some proxies will obey them, thus reducing network traffic
overall.  It is possible, however, that the introduction of this
method will induce some marginally interested servers which have not
previously engaged in cache-busting to engage in proxy-manipulation.
We won't know until its deployed, but we must acknowledge the
possibility.  

I also personally believe that this possibility represents a
fundamental change in how the proxy servers must be viewed in the
interaction chain; we could debate this, but I would prefer a design
that did not make so fundamental a change.  To that end I actually
prefer the "usage limit" aspect of the proposal to the reporting
aspect of the proposal.  From my point of view, it extends a current
mechanism by creating a new way for documents to "expire".
Metaphorically, it makes a cached web document like a new tire--the
warranty expires in 6 years or 60,000 miles.  We already have methods
for dealing with expiration and revalidation; we do not already have
methods for proxies to report data to origin servers.

I recognize that the method is less intuitive than a reporting
mechanism; every provider would need a way to handle the uncertainty
induced by the range between the first hit assigned to a proxy cache
and the max-hits allowed it.  You make clear, however, that a server
need not give the same number of max-hits every time, and algorithms
for keeping that range small are availalbe.  Making sane
recommendations for how to do it could eliminate much of the
confusion.  

Using the max-hits method alone also avoids many of the potential
privacy issues which forcing the proxy to report may imply.

> If someone is able to describe a specific scenario where the use
> of the Meter mechanism, as proposed in our draft, does in fact provide
> more per-client information than the existing HTTP/1.1 mechanisms,
> then we would regard this as a bug in our specification that needs
> to be fixed (or at least, that needs to be called out in the Security
> Considerations section).

The use of the Vary header in a do-report situation clearly provides
more information than is currently the case where a proxy cache is
being used.  Currently, if I employ a proxy-cache and it requests a
resource on my behalf, the origin server gets the data on the proxy
cache (the cache may report through some data on the origin requestor,
but it doesn't have to).  If the origin server cache-busts, the
proxy-cache must re-request the data every time, but the origin server
gets the data on the proxy-cache every time.  With your proposal, it
could get aggregates of the data on the actual requestors.  This
compromises privacy.  Imagine for a moment that someone used a Vary:
on the Host header with Meter.   


> The "stickiness" of the Meter request-directive is only a performance
> optimization, and if there are serious technical arguments against
> it, we could remove that without affecting any other aspect of the
> proposal.
> 
> But I do not think it is accurate to think of this in the same way
> that we have previously discussed "sticky" headers, since those
> were for actual request-headers.  The Meter request header is a sort of
> unusual thing that applies to transport-level connections, not to
> individual requests, and so it might probably be better to use a
> term other than "sticky" here.  (The Meter response directives are
> per-response, but hop-by-hop, and so if there is a general "sticky"
> mechanism agreed upon for the rest of HTTP, then it could take advantage
> of this.)

I'm not sure how good an optimization it is.  You mentioned above that
an server would probably cache-bust now only on those resources for which
it needs accurate counts (like an ad image).  By making this a per-connection
header, you seem to me to force a proxy to report and a server to receive
information it may well throw away (like the counts on every little
fancy bar or button image).

Whatever it is called, I also suspect that we need a generic method
for dealing with this issue, rather than a one-off for this single
header.  If a server is currently designed to handle all aspects of
the negotation on a per-request basis, redesigning it to allow some
per-connection headers is enough of a job that we should get it right
before we ask for standardization.  If this goes to experimental,
rather than standard, then I have no objection, as experimentation on
"per-connection" headers is needed.  

			regards,
				Ted Hardie
				NASA NIC
Received on Thursday, 21 November 1996 16:53:48 UTC