log formats

From: Martin F. Arlitt (arlitt@hpl.hp.com)
Date: Mon, Apr 26 1999

Message-ID: <3724B82A.934337CA@hpl.hp.com>
Date: Mon, 26 Apr 1999 19:02:02 +0000
From: "Martin F. Arlitt" <arlitt@hpl.hp.com>
To: www-wca@w3.org
Subject: log formats

One of my co-workers (John Dilley) attended the 4th Web Caching Workshop in San
Diego at the start of April.  John told me that during the workshop the issue of
log formats came up a number of times.  Since the squid people are heavily
involved in this workshop, they may take steps to address the suggested changes.
I think that the W3C should get involved before any changes are made, in order
that we can have some say in any new common format that is developed and deployed
in a popular product such as squid.  I would like to hear any comments from the
group on this topic.  I have attached some of John's comments on what was
discussed at the workshop.


> >John Dilley wrote:
> >
> >>         It was raised during discussions.  Some specific suggestions for
> >> things to add to the log format:
> >>
> >>     - Access time and request duration with microsecond resolution
> >>     - Last modified time and Expires header time, if present
> >>     - Whether cookies and cache-control headers were present (1 or 0)
> >>     - Whether the request was a result of client IMS, or resulted in IMS
> >>
> >>         An MD5 checksum on the content would be a nice option for some
> >> work, like Craig Wills's but the MD5 is too heavyweight to implement in
> >> a general proxy.  Still, augmenting a proxy and adding the capability
> >> might be useful for certain research...  Having a flexible log format
> >> would be great.
> >>
> >>         Since logging can be so expensive I have another suggestion: why
> >> not create the log in a compact binary format, similar to what you and I
> >> have created to do log analysis?  Timestamps fit nicely into 4 bytes
> >> instead of 24 bytes of Wed Apr 7 08:10:07 1999 which you have to parse
> >> one of many different ways...  Combined with a publicly available set of
> >> library components and tools to read and process logs (including of
> >> course a tool that spits out CLF output from the binary log) I think
> >> this would be a pretty good thing.  I'm interested to hear the group's
> >> feelings on this.  Regards,
> >>
> >>                              --       jad       --
> >>                           John Dilley <jad@hpl.hp.com>
> >