- From: Brian Behlendorf <brian@organic.com>
- Date: Mon, 17 Jul 1995 17:59:15 -0700 (PDT)
- To: Terry Myerson <tmyerson@iserver.interse.com>
- Cc: www-talk@w3.org
On Mon, 17 Jul 1995, Terry Myerson wrote: > The first thing Interse' market focus does is group the requests on > "Differentiating Characteristics." (DC's). These DC's are entries within > the log files that will be constant throughout a user session, but different > among absolutely different sessions. Could you elaborate on these DC's? What can you key off of except hostnames from CLFF data? > Next, we walk through the request stream within each DC group. New sessions > are demarcated when objects are requested and not cached, when they should be, So a request for a previously fetched item that resulted in a 200 instead of a 304 is considered a new user? That doesn't compute. > and there is a large time gap in the request stream within a DC group. Which might be sufficient for lightly loaded sites, but sites with many simultaneous visitors coming from behind large proxies are indistriguishable. It sounds like you can count "sessions" within a 10% accuracy, but that's much different than "users". One person visiting 20 times is largely indistinguishable from 20 people visiting once. > >There's going to be a whole lotta hits coming from Vienna, Virginia, > >White Plains, NY, and Columbus, Ohio! > > Indeed, the online services due lead all other organizations in bringing users > to the web. Of course, this software will confirm if this is true of your > web site's user community. I think you're missing the point - there is very often (most often?) *no* connection between the physical location of a web visitor and the location listed on their Internic registration. Heuristics can only go so far to assuage this. And when msn.net comes on line, half the traffic could be coming from Redmond, Washington. Perhaps the team at GVU who did the most recent internet survey could provide some analysis of the where-people-are-really-located vs. where-the-nic-says-they-are question. > We've busted our buts to put together a software package which can answer these > questions, conveniently and cost-effectively. I don't doubt you spent a lot of effort on this, and that there is a need for this. You know the line: "there are lies, there are damn lies, and then there are statistics". It's very important to get the answers to these questions *right*, and not base them on assumptions and heuristics which just aren't true, and make promises about what the numbers mean. I'd welcome comments on some thoughts on this topic I collected together a little while ago. It's at http://www.organic.com/Home/Services/traffic-analysis.html Brian --=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-- brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Received on Monday, 17 July 1995 21:01:26 UTC