Re: What's next after the common log format? from Mike Meyer on 1995-06-24 (www-talk@w3.org from May to June 1995)

From: Mike Meyer <mwm@contessa.phone.net>
Date: Fri, 23 Jun 95 18:08:11 PST
To: www-talk@www10.w3.org
Message-Id: <19950623.74E33B0.101F8@contessa.phone.net>
> Has anyone else wrestled with designing better log formats?  Are there any
> other formats in use?

I believe WN uses a different format, along with tools to turn it back
into CLF. AWS uses a different format, driven by a desire to make both
error logs and access logs have the same format and be more usefull.
It was a tough decision, though.

> Another approach is to create new log files for new items (access_log,
> error_log, security_log, etc.).  This makes it easy to accomodate new item
> without breaking existing tools, but creates a problem when you have to
> correlate across files to find all of the info associated with a particular
> client request.

Right. At one point, WN did this. Even worse, if you log to syslog, it
created mutliple entries in the same log file for each request,
connected only by the process ID making the request if you were
running out of inetd. I don't know what it did if you were running
standalone.

> There's also a bit of expansion, because you usually
> replicate the timestamp or some other request info.

You pretty much have to if you want to be able to, for instance,
correlate the refererer_log entry with the 404 entry to see if you've
got a bad reference on one of your pages.

> To solve these problems, we designed a new format based on name-value
> pairs.  Fields are named so that you can accomodate new stuff without
> breaking tools (which ignore fields they don't know about).  It's also an
> integrated format:  all of the info associated with the request (access,
> error, security info, etc.) is written in a single log entry.

The last step is a good one. That's the thing that convinced me that
AWS should change the format. I *wanted* user-agent & referer
information attached to a request. Given that I've broken CLF already,
there's no reason to change it.

> We ended up using a Tcl list-based format for the entries, that look like
> this (wrapped for the reader's benefit):
> log {start 803173054.917815} {method GET} {url /~payne/link.html} \
>     {bytes 0} {error {file not found}} {status 404} {end 803173054.930446} \
>     {host localhost}

30% expansion I could live with. However:

Your example doesn't balance curlies; the line ends with the {error
.. } element open.

The real benefit of a "common log file" format is that you can share
tools between servers. For that to work in this case, you need to have
a standard for what all the names are, and what data is inside them.
Either that, or tools that can be told what names to look for and how
they should be treated. Basically, a map from "log names" to "tool
names"; i.e. bytes is "size", error/status is status, ("%s %s %s"
method, URL, protocol) is "request", and so forth.

Hmm - maybe a tool that ate NLF (New Log Format) files and a config
and spat out CLF files???

So, do you have a formal proposal written up? Want to cooperate on one
(might not work well - I want to use "("'s for the seperator :-)?

	<mike
Received on Friday, 23 June 1995 21:17:28 UTC