Re: log formats

From: Jim Pitkow (pitkow@parc.xerox.com)
Date: Mon, Apr 26 1999


Date: Mon, 26 Apr 1999 20:00:35 PDT
To: www-wca@w3.org
From: Jim Pitkow <pitkow@parc.xerox.com>
Message-Id: <99Apr26.200045pdt."363014"@louise.parc.xerox.com>
Subject: Re: log formats


Last year a group formed (XLF) outside the W3C  to redo log files in XML.
Our position as the HTTP-NG WCA was to watch to see if they make reasonable
progress and if so, make sure it links up with our thoughts.  Given that
not much as happened with XLF, it falls back on our plate whether or not to
do something with this.

Several options:
   * Adopt a wait-and-see attitude with the Squid work
   * Actively work with the Squid people
   * Develop our own working draft under the W3C 

Thoughts?

At 12:02 PM 4/26/99 , Martin F. Arlitt wrote:
>One of my co-workers (John Dilley) attended the 4th Web Caching Workshop
in San
>Diego at the start of April.  John told me that during the workshop the
issue of
>log formats came up a number of times.  Since the squid people are heavily
>involved in this workshop, they may take steps to address the suggested
changes.
>I think that the W3C should get involved before any changes are made, in
order
>that we can have some say in any new common format that is developed and
deployed
>in a popular product such as squid.  I would like to hear any comments
from the
>group on this topic.  I have attached some of John's comments on what was
>discussed at the workshop.
>
>Martin
>
>
>> >John Dilley wrote:
>> >
>> >>         It was raised during discussions.  Some specific suggestions for
>> >> things to add to the log format:
>> >>
>> >>     - Access time and request duration with microsecond resolution
>> >>     - Last modified time and Expires header time, if present
>> >>     - Whether cookies and cache-control headers were present (1 or 0)
>> >>     - Whether the request was a result of client IMS, or resulted in IMS
>> >>
>> >>         An MD5 checksum on the content would be a nice option for some
>> >> work, like Craig Wills's but the MD5 is too heavyweight to implement in
>> >> a general proxy.  Still, augmenting a proxy and adding the capability
>> >> might be useful for certain research...  Having a flexible log format
>> >> would be great.
>> >>
>> >>         Since logging can be so expensive I have another suggestion: why
>> >> not create the log in a compact binary format, similar to what you and I
>> >> have created to do log analysis?  Timestamps fit nicely into 4 bytes
>> >> instead of 24 bytes of Wed Apr 7 08:10:07 1999 which you have to parse
>> >> one of many different ways...  Combined with a publicly available set of
>> >> library components and tools to read and process logs (including of
>> >> course a tool that spits out CLF output from the binary log) I think
>> >> this would be a pretty good thing.  I'm interested to hear the group's
>> >> feelings on this.  Regards,
>> >>
>> >>                              --       jad       --
>> >>                           John Dilley <jad@hpl.hp.com>
>> >
>