Re: Hit-metering: to Proposed Standard? from Jeffrey Mogul on 1996-11-20 (ietf-http-wg@w3.org from October to December 1996)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Wed, 20 Nov 96 14:19:20 PST
To: hardie@nasa.gov
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9611202219.AA29982@acetes.pa.dec.com>
Ted, thanks for your thoughtful comments.  I think you raise some
important issues, but I believe that we have covered them in our
design.  We may not, however, have expressed this clearly enough.
(So if you end up agreeing with me after reading this message, then
I guess we need to improve the clarity of the I-D).

    As your draft makes admirably clear, this new mechanism creates a
    duty between proxy and an origin server, which is a fundamentally
    different relationship than obtained before.

Well, I guess we didn't make this clear.  The new mechanism does
NOT create a duty.  What it does is to allow a proxy and server
(not necessarily an origin server!) to agree on a connection-by-
connection basis to enter into a "contract" of sorts.  So the
only duty imposed by our proposal is that if you send a Meter
header with certain directives, then you are bound to honor a
standardized interpretation of those directives.

    As was pointed out at
    previous meetings, proxy servers currently act on behalf of the
    end-user; to change that behavior without some way of letting the
    end user know it has changed has privacy implications

This is another good point, but we have been quite sensitive to
privacy and autonomy issues.  First of all, a proxy is NEVER required
to agree to provide hit-metering, period.

Second, and perhaps most important, our design does not transmit
more data about a user to the origin server (or any inbound server)
than would be transmitted using the existing features of HTTP/1.1.
I believe this is an unconditionally true statement: the use of
the Meter mechanism does not result in the communication of any
information beyond that provided if the Meter mechanism is not used
(assuming that the proxy conforms to the HTTP/1.1 spec).  In fact,
I believe that it communicates significantly less information about
individual clients.

If someone is able to describe a specific scenario where the use
of the Meter mechanism, as proposed in our draft, does in fact provide
more per-client information than the existing HTTP/1.1 mechanisms,
then we would regard this as a bug in our specification that needs
to be fixed (or at least, that needs to be called out in the Security
Considerations section).

Note that a proxy that chooses not to conform to the existing HTTP/1.1
caching mechanisms (e.g., "proxy-revalidate") for privacy reasons is
not required to use Meter, and so is equally able to follow the same
privacy policy.

    (and these
    are not necessarily the same privacy implications as exist when
    "proxy-revalidate" is sent from origin servers, because proxies
    have access to data across multiple servers).

I'm afraid that I don't understand your point here.  Could you
illustrate with an example?

    If we accept that any hit-metering proposal must create such a
    duty, we need to be very careful about the complexity of the duty
    that is assigned.  Your proposal seems to me to have three
    different layers of potential duties: the duty to limit usage; the
    duty to report usage; and the duty to report how Vary headers
    resulted in usage.  The first duty seems to me easy to implement
    and non-invasive in results; it is less accurate than reportage,
    but within a scope which is controlled by the origin server and
    which can be modified as time goes on.  The second duty begins to
    create a more complex duty; it is not that difficult to implement
    but may actual be a serious burden on very large proxy caches,
    especially if they make a best effort to limit network traffic by
    chaining reportage of hits against different resources on the same
    origin server.

Again, a proxy is NEVER required to offer to usage-limit a resource.
NEVER.  Our proposal provides a way for a proxy to make this offer,
but does not require it to make this offer.

    The third level seems to me far beyond what should
    be expected of a proxy server; it asks the proxy to track the kind
    of demographic data that can be both very complex and an invasion
    of privacy. (I say that with some disappointment, frankly, as the
    third level of data is really the only kind that I am
    professionally interested in, as that level would give me
    information I need to plan for future resources).

Again, if privacy is a concern, the proxy need not offer to hit-meter
or usage-limit.

And even if a proxy does offer to hit-meter or usage-limit, it is
always allowed to meet its "contractual duty" for a given resource
by simply doing the equivalent of "proxy-revalidate" for the response.
It can make this determination *after* examining the response, to
see if the "Vary" header in the response asks it for information that
it would not normally provide.

Since we expect that an origin server would normally request hit-metering
or usage-limiting for precisely those resources for which it would
normally send "proxy-revalidate", this seems to be neutral as far
as privacy is concerned.

Or perhaps even better than neutral.  Suppose you (at an origin server)
want to know how your user community breaks down by User-agent, but you
have no need for other per-request headers (such as Accept-* headers,
Via headers, etc.)  The ability to say
	Meter: do-report
	Vary: User-agent
means that you will end up with the counts that you need, but WITHOUT
collecting a lot of irrelevant information (and so without collecting
information that may compromise other privacy considerations).

    Your proposal says very clearly that "any proxy that transmits the
    Meter header in a request MUST implement every requirement of this
    specification, without exception or amendment."  I don't think that
    this is reasonable; I realize that your proposal includes methods
    which allow a proxy to "implement" a requirement by failing to
    offer a service, but I think a design in which there were some true
    MUSTs, and other SHOULDs and MAYs would be more appropriate.  If we
    can establish which duty is a "MUST" for this scheme to work, we
    can make this radically easier to implement and use; especially as
    that determination will also make clear to origin servers which of
    the mechanisms (reportage or usage-exhaustion) is going to be the
    basic method for usage counting.  If there are multiple methods
    which can produce different counts, there are going to be problems,
    even if the range of difference is small on a per-proxy basis.

Again, the proposal NEVER requires that a proxy agree to do ANYTHING.
The "MUST" that you quote simply requires that if a proxy does use
the Meter header to offer to do something, then it must faithfully
carry out what it offers to do.  If you can find a MUST in our
proposal that somehow binds a proxy to do something against its explicit
choice, then this is a bug that we will fix.

    Even assuming that this basic design is accepted, there are some
    problems in your current proposal.  The description of the
    "metering subtree", for example, imples that proxies working
    together within a "tree" to obey usage limits and maintain counts;
    there is an underlying assumption, however, that the proxies in
    that tree will maintain a particular "path" of proxies back to the
    origin server, which may not always be the case.

Remember that the "contracts" it creates are for a single hop, they are
NOT necessarily between a proxy and the origin server.  So we envision
that even if a proxy's path to the origin server does change, it's
"contract" with the previous inbound proxy still holds.  And remember
that our proposal explicitly uses a "best-efforts" model, which means
that if the previous inbound proxy is not reachable, your proxy is
under no obligation to try forever to report the hit-counts.

Your comment does raise a subtle point that we had ignored, which is
that a faithful implementation of hit-counting would, in theory,
have to record the identity of the inbound proxy from which each
response is received.  In practice, there are numerous ways around
this (for example, flushing out the pending hit-count reports before
changing the path configuration), but in any case it does not seem
too onerous to do a naive implementation which simply records the
IP address of the source of each response.  I believe that systems
like Harvest/Squid probably already need to do something like this.

    The use of Meter as a sticky header also presents some problems, as
    a meter request directive is sticky and other meter headers are
    not.  You also provide no method of unsticking the meter-request
    directive other than closing the connection.  I frankly think the
    whole mechanism of creating sticky headers needs to be worked out
    in other draft, rather implied by the behavior of one header made
    sticky here.

The "stickiness" of the Meter request-directive is only a performance
optimization, and if there are serious technical arguments against
it, we could remove that without affecting any other aspect of the
proposal.

But I do not think it is accurate to think of this in the same way
that we have previously discussed "sticky" headers, since those
were for actual request-headers.  The Meter request header is a sort of
unusual thing that applies to transport-level connections, not to
individual requests, and so it might probably be better to use a
term other than "sticky" here.  (The Meter response directives are
per-response, but hop-by-hop, and so if there is a general "sticky"
mechanism agreed upon for the rest of HTTP, then it could take advantage
of this.)

As to the issue of "unsticking" ("unstickying?") the meter-request
directive: remember that a proxy that has offered, say, to hit-meter
responses it receives on a connection is able to meet this obligation by
(in effect) removing the Meter header and adding "proxy-revalidate".
While this may result in its generating conditional GETs on responses
that the server doesn't want hit-metered, this is again just a performance
issue, not a correctness one.  But if you are concerned about performance
and you think that it's worth including an "unsticking" feature, please
suggest something.

-Jeff
Received on Wednesday, 20 November 1996 14:31:02 UTC