Re: What's next after the common log format? from Mike Meyer on 1995-06-29 (www-talk@w3.org from May to June 1995)

From: Mike Meyer <mwm@contessa.phone.net>
Date: Wed, 28 Jun 95 20:52:54 PST
To: www-talk@www10.w3.org
Message-Id: <19950628.76ADB68.130B4@contessa.phone.net>
> >So, do you have a formal proposal written up? Want to cooperate on one
> >(might not work well - I want to use "("'s for the seperator :-)?
> By the way, the '{' seperators came from Tcl; the Tcl C library has
> built-in routines to put together and tear apart lists in this string
> format.

Right - this is pretty much my reason for wanting "("'s in place of
"{"'s; I have routines to tear apart and put together lists in this
format. I'm not sure what Perl5 does, but I wouldn't be surprised if
there was some format that it favored as well.

My feeling is that the format should have some flexibility built into
it. For instance, the first non-whitespace in the file must be an open
of some delimiter pair, and those paired delimiters are then used to
mark the beginning and end of list items. Likewise, either single or
double quotes can be used to mark strings, with most tokens
automatically turning into strings if they aren't quoted.

In other words, a nearly fully quoted entry with parens:

	((date "Monday, 26-Jun-95 13:22:59 GMT") (host "sesame.henca.ac.uk")
	 (method GET) (URL "/~mwm/weblink/overview") (protocol "HTTP/1.0")
	 (user-agent
	      "NCSA_Mosaic/2.6b3 (X11;SunOS 4.1.3 sun4m) libwww/2.12 modified")
	 (bytes 2534) (status 200))

and a minimally quoted entry with curlies:
	     
	{{date 'Monday, 26-Jun-95 13:22:59 GMT'} {host sesame.henca.ac.uk}
	 {method GET} {URL '/~mwm/weblink/overview'} {protocol HTTP/1.0}
	 {user-agent
	      'NCSA_Mosaic/2.6b3 (X11;SunOS 4.1.3 sun4m) libwww/2.12 modified'}
	 {bytes 2534} {status 200}}

are both printed representation of the same log entry.

People writing tools for their server - whether that means one they
run and got from someone else, or one they are writing to distribute -
can deal with the specific syntax of their log file. People interested
in writing more general tools - for whatever reason - can deal with a
set of log files that have minor syntactic difference that are
relatively easy to deal with, but all have the same structure.

> We wanted something that was robust, easy to parse, but human
> readable for those parsing emergencies.

Makes sense. I think both formats above qualify.

> Of course, the fastest way to get everyone to agree would be to publish the
> C and Perl source code to read and write the log format.

That only works if you limit "everyone" to those who want to use C or
Perl to deal with log entries.

Hmm - how about an SGML version of this? I'm not sure it'd work, but
it's certainly a format that everyone should have experience dealing
with by now. I'll think about it.

BTW, I've also been thinking about buffering log entries, or batching
them, or otherwise dealing with more entries than the platform can
reasonably handle. This seems a bit platform-specific (as in the
reasons for bottlenecking differ from platform to platform), and is a
problem whose solution on that platform could be applied to things
other than the HTTP server.

	<mike
Received on Thursday, 29 June 1995 00:01:15 UTC