- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Wed, 20 Nov 96 14:19:20 PST
- To: hardie@nasa.gov
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Ted, thanks for your thoughtful comments. I think you raise some important issues, but I believe that we have covered them in our design. We may not, however, have expressed this clearly enough. (So if you end up agreeing with me after reading this message, then I guess we need to improve the clarity of the I-D). As your draft makes admirably clear, this new mechanism creates a duty between proxy and an origin server, which is a fundamentally different relationship than obtained before. Well, I guess we didn't make this clear. The new mechanism does NOT create a duty. What it does is to allow a proxy and server (not necessarily an origin server!) to agree on a connection-by- connection basis to enter into a "contract" of sorts. So the only duty imposed by our proposal is that if you send a Meter header with certain directives, then you are bound to honor a standardized interpretation of those directives. As was pointed out at previous meetings, proxy servers currently act on behalf of the end-user; to change that behavior without some way of letting the end user know it has changed has privacy implications This is another good point, but we have been quite sensitive to privacy and autonomy issues. First of all, a proxy is NEVER required to agree to provide hit-metering, period. Second, and perhaps most important, our design does not transmit more data about a user to the origin server (or any inbound server) than would be transmitted using the existing features of HTTP/1.1. I believe this is an unconditionally true statement: the use of the Meter mechanism does not result in the communication of any information beyond that provided if the Meter mechanism is not used (assuming that the proxy conforms to the HTTP/1.1 spec). In fact, I believe that it communicates significantly less information about individual clients. If someone is able to describe a specific scenario where the use of the Meter mechanism, as proposed in our draft, does in fact provide more per-client information than the existing HTTP/1.1 mechanisms, then we would regard this as a bug in our specification that needs to be fixed (or at least, that needs to be called out in the Security Considerations section). Note that a proxy that chooses not to conform to the existing HTTP/1.1 caching mechanisms (e.g., "proxy-revalidate") for privacy reasons is not required to use Meter, and so is equally able to follow the same privacy policy. (and these are not necessarily the same privacy implications as exist when "proxy-revalidate" is sent from origin servers, because proxies have access to data across multiple servers). I'm afraid that I don't understand your point here. Could you illustrate with an example? If we accept that any hit-metering proposal must create such a duty, we need to be very careful about the complexity of the duty that is assigned. Your proposal seems to me to have three different layers of potential duties: the duty to limit usage; the duty to report usage; and the duty to report how Vary headers resulted in usage. The first duty seems to me easy to implement and non-invasive in results; it is less accurate than reportage, but within a scope which is controlled by the origin server and which can be modified as time goes on. The second duty begins to create a more complex duty; it is not that difficult to implement but may actual be a serious burden on very large proxy caches, especially if they make a best effort to limit network traffic by chaining reportage of hits against different resources on the same origin server. Again, a proxy is NEVER required to offer to usage-limit a resource. NEVER. Our proposal provides a way for a proxy to make this offer, but does not require it to make this offer. The third level seems to me far beyond what should be expected of a proxy server; it asks the proxy to track the kind of demographic data that can be both very complex and an invasion of privacy. (I say that with some disappointment, frankly, as the third level of data is really the only kind that I am professionally interested in, as that level would give me information I need to plan for future resources). Again, if privacy is a concern, the proxy need not offer to hit-meter or usage-limit. And even if a proxy does offer to hit-meter or usage-limit, it is always allowed to meet its "contractual duty" for a given resource by simply doing the equivalent of "proxy-revalidate" for the response. It can make this determination *after* examining the response, to see if the "Vary" header in the response asks it for information that it would not normally provide. Since we expect that an origin server would normally request hit-metering or usage-limiting for precisely those resources for which it would normally send "proxy-revalidate", this seems to be neutral as far as privacy is concerned. Or perhaps even better than neutral. Suppose you (at an origin server) want to know how your user community breaks down by User-agent, but you have no need for other per-request headers (such as Accept-* headers, Via headers, etc.) The ability to say Meter: do-report Vary: User-agent means that you will end up with the counts that you need, but WITHOUT collecting a lot of irrelevant information (and so without collecting information that may compromise other privacy considerations). Your proposal says very clearly that "any proxy that transmits the Meter header in a request MUST implement every requirement of this specification, without exception or amendment." I don't think that this is reasonable; I realize that your proposal includes methods which allow a proxy to "implement" a requirement by failing to offer a service, but I think a design in which there were some true MUSTs, and other SHOULDs and MAYs would be more appropriate. If we can establish which duty is a "MUST" for this scheme to work, we can make this radically easier to implement and use; especially as that determination will also make clear to origin servers which of the mechanisms (reportage or usage-exhaustion) is going to be the basic method for usage counting. If there are multiple methods which can produce different counts, there are going to be problems, even if the range of difference is small on a per-proxy basis. Again, the proposal NEVER requires that a proxy agree to do ANYTHING. The "MUST" that you quote simply requires that if a proxy does use the Meter header to offer to do something, then it must faithfully carry out what it offers to do. If you can find a MUST in our proposal that somehow binds a proxy to do something against its explicit choice, then this is a bug that we will fix. Even assuming that this basic design is accepted, there are some problems in your current proposal. The description of the "metering subtree", for example, imples that proxies working together within a "tree" to obey usage limits and maintain counts; there is an underlying assumption, however, that the proxies in that tree will maintain a particular "path" of proxies back to the origin server, which may not always be the case. Remember that the "contracts" it creates are for a single hop, they are NOT necessarily between a proxy and the origin server. So we envision that even if a proxy's path to the origin server does change, it's "contract" with the previous inbound proxy still holds. And remember that our proposal explicitly uses a "best-efforts" model, which means that if the previous inbound proxy is not reachable, your proxy is under no obligation to try forever to report the hit-counts. Your comment does raise a subtle point that we had ignored, which is that a faithful implementation of hit-counting would, in theory, have to record the identity of the inbound proxy from which each response is received. In practice, there are numerous ways around this (for example, flushing out the pending hit-count reports before changing the path configuration), but in any case it does not seem too onerous to do a naive implementation which simply records the IP address of the source of each response. I believe that systems like Harvest/Squid probably already need to do something like this. The use of Meter as a sticky header also presents some problems, as a meter request directive is sticky and other meter headers are not. You also provide no method of unsticking the meter-request directive other than closing the connection. I frankly think the whole mechanism of creating sticky headers needs to be worked out in other draft, rather implied by the behavior of one header made sticky here. The "stickiness" of the Meter request-directive is only a performance optimization, and if there are serious technical arguments against it, we could remove that without affecting any other aspect of the proposal. But I do not think it is accurate to think of this in the same way that we have previously discussed "sticky" headers, since those were for actual request-headers. The Meter request header is a sort of unusual thing that applies to transport-level connections, not to individual requests, and so it might probably be better to use a term other than "sticky" here. (The Meter response directives are per-response, but hop-by-hop, and so if there is a general "sticky" mechanism agreed upon for the rest of HTTP, then it could take advantage of this.) As to the issue of "unsticking" ("unstickying?") the meter-request directive: remember that a proxy that has offered, say, to hit-meter responses it receives on a connection is able to meet this obligation by (in effect) removing the Meter header and adding "proxy-revalidate". While this may result in its generating conditional GETs on responses that the server doesn't want hit-metered, this is again just a performance issue, not a correctness one. But if you are concerned about performance and you think that it's worth including an "unsticking" feature, please suggest something. -Jeff
Received on Wednesday, 20 November 1996 14:31:02 UTC