RE: Just what is a certified log? and who certifies it?

From: Jim Pitkow (pitkow@parc.xerox.com)
Date: Fri, Apr 23 1999


Date: Thu, 22 Apr 1999 23:12:26 PDT
To: "Brodsky, Lloyd" <Lloyd.Brodsky@thomson.com>, "'www-wca@w3.org'" <www-wca@w3.org>
From: Jim Pitkow <pitkow@parc.xerox.com>
Message-Id: <99Apr22.231241pdt."364935"@louise.parc.xerox.com>
Subject: RE: Just what is a certified log? and who certifies it?


Hi Llyod,

At 06:52 AM 4/22/99 , Brodsky, Lloyd wrote:
>Different goal certainly call for different methods -- but I'm still having
>some problems visualizing what the repository's certification process would
>consist of and what potential problems that process would avoid. The notion
>of the RECIPIENT being able to certify anything other than its own identify
>and mere receipt of a log file is an interesting one and a concept that I'd
>like to hear more about.

Ah, we may be using the term certify and validate loosely.  The goal is to
provide a repository of logs that have been described with a set of
meta-data that captures information about the log and the nature of its
contents, the data in the log validated against what it is supposed to be
(data type checking, etc.), and described statistically by extracting the
distribution of various metrics.  The final entries and meta-data are
inspected by editors for correctness.  By providing these processes and
checks, we hope to be able to create a forum whereby results from various
logs can be validated by other researchers as well as facilitate new
research (as diverse log files are a precious commodity).

>I've been doing traffic analysis work with a number of Thomson companies and
>I'd like to help. I'm just trying to visualize what you'd do to certify,
>say, the Toronto Globe & Mail's 3.5 gig of extended format logs a week (the
>largest of the 34 Thomson newspapers) were I to send them over.

This could provide a nice test case for the repository (as well as provide
a very interesting set of data to the research community).  The basic
process is described above, but the developers of the repository are
currently working on a white paper that describes in more detail the
process of validation/certification and will post to the list soon.  

What tools do you currently use to analyze the traces?  What do you find
are the limitations of the current tools?

Jim.