How to certify a log
From: by way of Jim Pitkow (bala@research.att.com)
Date: Fri, Jan 29 1999
Message-Id: <4.1.19990129002316.0092aa30@mailback.parc.xerox.com>
Date: Fri, 29 Jan 1999 00:23:46 PST
To: www-wca@w3.org
From: Balachander Krishnamurthy <bala@research.att.com> (by way of Jim Pitkow <pitkow@parc.xerox.com>)
Subject: How to certify a log
Here is a set of issues to deal with while trying to certify a log.
A log that has not been certified should be so labeled before insertion
into the repository. Studies done on logs used from the repository should
indicate whether they used certified logs or not.
. Does it have the right set of fields (number
. Does log only have fields it is supposed to have
Presence of referer field if it is meant to be absent
(check before insertion into repository)
. Do the types of fields match with the expected type
(string instead of a number, -1 instead of a date)
. Are the fields within the range they should be
(bogus date/IMS/lmodtime - in the future, invalid response codes..)
. Are the individual values clean to facilitate parsing
(embedded '/', control characters, reasonable length etc.)
. Sanity across log:
are dates in the range and monotonically increasing
distribution of content sizes reasonable
% of response codes in expected range? (rare to have non-200/304 > 5%)
this is a subset of what one might want to do of course. but maybe this
is the *minimal* subset. logs that aren't even this clean are suspicious
and results obtained from them can vary in inaccuracy... i have run into
almost all the above problems listed above and wasted plenty of time as
a result.
cheers,
bala