Re: What's next after the common log format? from Dave Beckett on 1995-06-26 (www-talk@w3.org from May to June 1995)

From: Dave Beckett <D.J.Beckett@ukc.ac.uk>
Date: Mon, 26 Jun 1995 11:44:25 +0100
To: Andrew Payne <payne@openmarket.com>
Cc: www-talk@www10.w3.org
Message-Id: <29691.804163465@mint.ukc.ac.uk>

>>>>> On Fri, 23 Jun 1995 20:43:02 -0400, Andrew Payne <payne@openmarket.com> said:
Andy> [Snip]
Andy> Has anyone else wrestled with designing better log formats?
Andy> Are there any other formats in use?

Yes, I have considered and designed around the problems in this area
and published a paper on it at WWW95 in Darmstadt.

I work with a large WWW site with multiple access methods and need to
summarise the logging information from the 5+ running methods.  I
thus want to include all the data that they provide, as efficiently
as possible.

For my problem, I concluded that a fixed field log file was a
suitable answer.  I considered the attribute/value format at one
point but that would be slow to process and work with standard tools.

My final result was a fixed field record format like this.

<Type>\t<Operation>\t<Date-Time>\t<Name/Path>\t<Size>\t<User>\t<Site>\t<Email>

for example:
ftp     txfile  1994-01-19-11:27:14     /ftp/pub/parallel/documents/inmos/archive-server/checkocc/test80xa.occ  58019   -       123.45.67.89    abcdef@ghijklmn.fr
gopher  txfile  1994-01-19-11:27:39     /ftp/pub/parallel/parlib/butterfly/queens/bflyparqueens.c       4789    -       abc.def.Uni-ghijk.DE    -
http    txfile  1994-01-19-11:27:54     /usr/l/lib/httpd/htdocs/parallel/home.html      961     -       unix.hensa.ac.uk        -

where the fields are separated by TABS.
Typical types are ftp, gopher, http, wais, archie, ...
Typical operations are transmit a file (txfile), directory (txdir),
receive a file (rxdir) and some proxy ones: txfile/proxy/hit,
txfile/proxy/hit, ...
The name field is usually a file name but can be a URL (for proxies),
a search request (archie) or whatever appropriate.

The problems with this are two fold:
(1) It is a fixed format record; hence adding new fields is a
problem, such as status/error/result or sub-command which I came
across too late to fix.  I just appended the status field to the
operation e.g. for a proxy error it could be txfile/proxy/fail=404

(2) Since I use TABs to separate the fields, care needs to be taken
about white space and quoting needs to be used.  I work round this
problem by deleting tabs from the fields.

BUT

It is pretty efficient, can be used by simple tools like grep, sort,
awk etc. and compresses (gzips) by >85% for the average log file.

Now, if I was designing it again, what would I add or change?  I
still like the efficiency and simplicity of the fixed field format
and might just add lots more fields; or make the system pass through
fields it didn't understand.  It would take a good deal of convincing
that the variant record form, although elegant and expressive, is
workable in the real world.

You can get the paper, the slides from the talk and the software
implementing it from here:
  <URL:http://www.hensa.ac.uk/tools/www/logtools/index.html>
or see my home page
  <URL:http://www.hensa.ac.uk/parallel/www/djb1.html>

Dave

Received on Monday, 26 June 1995 06:44:58 UTC