W3C home > Mailing lists > Public > www-talk@w3.org > July to August 1995

Re: Accurate user-based log file analysis

From: Terry Myerson <tmyerson@iserver.interse.com>
Date: Mon, 17 Jul 1995 18:34:39 -0700
Message-Id: <199507180134.SAA04318@iserver.interse.com>
To: Brian Behlendorf <brian@organic.com>
Cc: www-talk@w3.org

Brian-

You are speaking to extremes. We have log files from over 100 organizations
in our test suite. The data has been scrutinized, and indeed both accurate
enough
and extremely valuable.

We are indeed talking about user sessions, and not users. My usage of the term
users was indeed a marketing decision, I apologize. But user sessions are still
a much better statistic to base business decisions upon than hits or unique
hostnames.

>Could you elaborate on these DC's?  What can you key off of except 
>hostnames from CLFF data?  

There are other DC's in there. 

>So a request for a previously fetched item that resulted in a 200 instead 
>of a 304 is considered a new user?  That doesn't compute.

The 304 vs. 200 comparison is very valuable, and has been ignored by most
other analysis programs. We are talking about statistical processes (caching
and our user-session algorithm), and this response code guides both.

Re: using request delays:
>Which might be sufficient for lightly loaded sites, but sites with many
>simultaneous visitors coming from behind large proxies are indistriguishable. 

This is true. For 95% of the sites out there, this is valuable. For HotWired,
it might not be.

>It sounds like you can count "sessions" within a 10% accuracy, but that's 
>much different than "users".  One person visiting 20 times is 
>largely indistinguishable from 20 people visiting once.

Print out your log file. Use your knowledge of your site to demarcate different
user sessions. If you can do it, why can' the computer? We believe our accuracy
is closer to 85%-- and will increase in a future release.

>I think you're missing the point - there is very often (most often?) *no* 
>connection between the physical location of a web visitor and the 
>location listed on their Internic registration.  Heuristics can only go 
>so far to assuage this.  And when msn.net comes on line, half the traffic 
>could be coming from Redmond, Washington.  Perhaps the team at GVU who 
>did the most recent internet survey could provide some analysis of the 
>where-people-are-really-located vs. where-the-nic-says-they-are question.

Most organizations are not IBM. Interse' has one office. Organic has one office.
Most of our customers have one office. The more organizations that connect
to your site, the more accurate this statistic will be. In a future release,
we will deal with the MSN case you speak of.

If the geographic statistics were consistent among very different customers,
or they simply didn't make sense-- then we wouldn't include the analysis.
But neither is the case. The results of these statistics do vary per customer,
and often make logical sense. 

It's only $495. Why not just try it out, and then criticize the accuracy.
You spend
that much on 1 month of support for your Indy.

-Terry

-----------------------------------
Terry Myerson
Interse' Corporation
408 732-0932 x-230
408 732-7038 fax
tmyerson@interse.com
http://www.interse.com
-----------------------------------
Received on Monday, 17 July 1995 21:36:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 October 2010 18:14:17 GMT