Re: Log tools

From: Jim Pitkow (pitkow@parc.xerox.com)
Date: Fri, Apr 30 1999


Date: Fri, 30 Apr 1999 00:12:31 PDT
To: "Bradford L. Barrett" <brad@mrunix.net>, www-wca@w3.org
From: Jim Pitkow <pitkow@parc.xerox.com>
Message-Id: <99Apr30.001639pdt."365517"@louise.parc.xerox.com>
Subject: Re: Log tools


Nice tool.  Much of the interest of this group lies in characterizing
various metrics of Web usage (distributions of file sizes, reading times,
session lengths, etc) to facilitate comparisons between traces, reproduce
the empirical findings reported by others, etc.  The WCA is also interested
in extracting various metrics to automatically characterize log file being
submitted to the WCA repository.  We're working on precisely defining the
metrics as well as recommended methods on how to measure the metrics (see:
http://www.oclc.org/oclc/research/projects/webstats/currmetrics.htm 
for a very early draft -- not yet a working draft).

Additionally, you can take a look at some rough code written in perl that
extracts a few of these distributions in the W3C CVS directory at:
	http://dev.w3.org/cgi-bin/cvsweb/WCA/Tools/

Do you have any interest in adding some of these capabilities in the
Webalizer?  How are you handling changes make to the code?  I did not see a
CVS or other structure for others to make contributions.

At 11:09 AM 4/28/99 , Bradford L. Barrett wrote:
>
>> Do we have any guidlelines on how to interpret logs, what to look for.  I
>> see a need for user education to improve accuracy and understanding. 
>
>As the author of the Webalizer, a web server log file analysis tool
>(http://www.mrunix.net/webalizer/), I have found that there are a
>lot of people out there running servers that have no clue as to what
>is possible/not possible regarding the analysis of their logs.  Some
>of the available tools out there do not help matters either, claiming
>all sorts of statistics that simply cannot be produced using existing
>logs (with any real accuracy).
>
>One of my design goals was to produce the most accurate statistics
>possible, which is why some of the 'features' that other analysis
>tools claim to have are not included in my code.  Being an open-
>source project, I don't need to generate the marketing hype that
>some of the commercial packages must, in order to sell their product.
>But it is this marketing hype that misleads users into believing
>that certain statistics are possible, and worse, accurate.  I think
>that industry sponsored guidelines (call them what you want) would
>do a great deal towards end-user education and understanding.  I
>don't see how accuracy would be increased though, except perhaps
>by allowing users to realize that some of the statistics generated
>are less accurate than what they were lead to believe.
>
>By the way, for those interested, the latest version (1.22-03) of
>the Webalizer was released last month, and is by far the most
>stable version to date.  Several members of the W3C, NCSA and some
>other large site admins helped in the debugging of the code, which
>now routinely handles analysis of sites with an excess of 50 million
>hits a month.  It's is GPL code, so anyone who wants to take a look
>"under the hood" can do so.
>
>--
>Bradford L. Barrett                      brad@mrunix.net
>A free electron in a sea of neutrons     DoD#1750 KD4NAW
>
>The only thing Micro$oft has done for society, is make people
>believe that computers are inherently unreliable.
>