- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Wed, 20 Nov 96 14:51:24 PST
- To: Benjamin Franz <snowhare@netimages.com>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Attempts to finess the system for the sake of improving hit counting are doomed from the start by the simple fact that most *browsers* have their own user selectable options for caching that are completely independant of the standards: 'Check every time','Check once per session', 'Never check'. This *alone* is enough to make efforts to make proxies report hits nearly irrelevant. Are they reporting 20 repeat hits from someone who 'checks every time' or 1 new hit each from people who 'Never check'? You don't know. We don't know. NO ONE knows. A server can *guess* based on referrer and IP address in their logs, or come very close to exact counts by anti-caching. But the necessary abstraction of data by the proxies on summary reports for hit-metering will defeat these efforts in log analysis and passing raw log information would defeat the *purpose* of proxies. My analysis of header logs from a very popular browser suggests that its "checks" (whether per-session or every-time) are done using If-modified-since headers. Our proposal specifically separates the counting of 304 responses from the counting of other responses (see section 5.3). Because of this, we can accurately count the number of non-checking GETs. What we cannot do is to accurately distinguish between a large number of users looking at a document with large browser caches, and a smaller number of users with small browser caches. But neither can any other hit-counting scheme that doesn't involve some sort of per-user data (such as cookies). And we are in no way proposing the transmission of raw logs! (our proposal takes no stand on this topic; the word "log" does not appear in it). This also does not begin to address the questions of privacy and security and their impact on the usage of hit-metering.. Many corporate proxies would more than reluctant to be sending out information about their internal usages to anyone who asked - they would be actively opposed to it. We run a large corporate proxy in our building (>1500000 refs/day) and we are extremely sensitive to privacy issues, so I'm not ignoring them. No corporate (or any other) proxy is required to do anything by our proposal. Aside from that, I addressed the privacy considerations in my response to Ted Hardie. -Jeff
Received on Wednesday, 20 November 1996 15:07:40 UTC