Re: Hit-metering: to Proposed Standard?

    Attempts to finess the system for the sake of improving hit
    counting are doomed from the start by the simple fact that most
    *browsers* have their own user selectable options for caching that
    are completely independant of the standards: 'Check every
    time','Check once per session', 'Never check'.  This *alone* is
    enough to make efforts to make proxies report hits nearly
    irrelevant. Are they reporting 20 repeat hits from someone who
    'checks every time' or 1 new hit each from people who 'Never
    check'? You don't know. We don't know. NO ONE knows. A server can
    *guess* based on referrer and IP address in their logs, or come
    very close to exact counts by anti-caching. But the necessary
    abstraction of data by the proxies on summary reports for
    hit-metering will defeat these efforts in log analysis and passing
    raw log information would defeat the *purpose* of proxies.

My analysis of header logs from a very popular browser suggests that
its "checks" (whether per-session or every-time) are done using
If-modified-since headers.  Our proposal specifically separates
the counting of 304 responses from the counting of other responses
(see section 5.3).  Because of this, we can accurately count the
number of non-checking GETs.

What we cannot do is to accurately distinguish between a large
number of users looking at a document with large browser caches,
and a smaller number of users with small browser caches.  But
neither can any other hit-counting scheme that doesn't involve
some sort of per-user data (such as cookies).

And we are in no way proposing the transmission of raw logs!
(our proposal takes no stand on this topic; the word "log"
does not appear in it).

    This also does not begin to address the questions of privacy and
    security and their impact on the usage of hit-metering.. Many
    corporate proxies would more than reluctant to be sending out
    information about their internal usages to anyone who asked - they
    would be actively opposed to it.

We run a large corporate proxy in our building (>1500000 refs/day)
and we are extremely sensitive to privacy issues, so I'm not ignoring
them.  No corporate (or any other) proxy is required to do anything by
our proposal.

Aside from that, I addressed the privacy considerations in my response
to Ted Hardie.

-Jeff

Received on Wednesday, 20 November 1996 15:07:40 UTC