- From: Dave Beckett <D.J.Beckett@ukc.ac.uk>
- Date: Mon, 26 Jun 1995 11:44:25 +0100
- To: Andrew Payne <payne@openmarket.com>
- Cc: www-talk@www10.w3.org
>>>>> On Fri, 23 Jun 1995 20:43:02 -0400, Andrew Payne <payne@openmarket.com> said: Andy> [Snip] Andy> Has anyone else wrestled with designing better log formats? Andy> Are there any other formats in use? Yes, I have considered and designed around the problems in this area and published a paper on it at WWW95 in Darmstadt. I work with a large WWW site with multiple access methods and need to summarise the logging information from the 5+ running methods. I thus want to include all the data that they provide, as efficiently as possible. For my problem, I concluded that a fixed field log file was a suitable answer. I considered the attribute/value format at one point but that would be slow to process and work with standard tools. My final result was a fixed field record format like this. <Type>\t<Operation>\t<Date-Time>\t<Name/Path>\t<Size>\t<User>\t<Site>\t<Email> for example: ftp txfile 1994-01-19-11:27:14 /ftp/pub/parallel/documents/inmos/archive-server/checkocc/test80xa.occ 58019 - 123.45.67.89 abcdef@ghijklmn.fr gopher txfile 1994-01-19-11:27:39 /ftp/pub/parallel/parlib/butterfly/queens/bflyparqueens.c 4789 - abc.def.Uni-ghijk.DE - http txfile 1994-01-19-11:27:54 /usr/l/lib/httpd/htdocs/parallel/home.html 961 - unix.hensa.ac.uk - where the fields are separated by TABS. Typical types are ftp, gopher, http, wais, archie, ... Typical operations are transmit a file (txfile), directory (txdir), receive a file (rxdir) and some proxy ones: txfile/proxy/hit, txfile/proxy/hit, ... The name field is usually a file name but can be a URL (for proxies), a search request (archie) or whatever appropriate. The problems with this are two fold: (1) It is a fixed format record; hence adding new fields is a problem, such as status/error/result or sub-command which I came across too late to fix. I just appended the status field to the operation e.g. for a proxy error it could be txfile/proxy/fail=404 (2) Since I use TABs to separate the fields, care needs to be taken about white space and quoting needs to be used. I work round this problem by deleting tabs from the fields. BUT It is pretty efficient, can be used by simple tools like grep, sort, awk etc. and compresses (gzips) by >85% for the average log file. Now, if I was designing it again, what would I add or change? I still like the efficiency and simplicity of the fixed field format and might just add lots more fields; or make the system pass through fields it didn't understand. It would take a good deal of convincing that the variant record form, although elegant and expressive, is workable in the real world. You can get the paper, the slides from the talk and the software implementing it from here: <URL:http://www.hensa.ac.uk/tools/www/logtools/index.html> or see my home page <URL:http://www.hensa.ac.uk/parallel/www/djb1.html> Dave
Received on Monday, 26 June 1995 06:44:58 UTC