- From: N.G.Smith <ngs@sesame.hensa.ac.uk>
- Date: Thu, 16 May 1996 13:51:20 +0100
- To: www-logging@w3.org
What are people's thoughts on standardising on a binary version of the
Extended Log File Format?
We perform a number of analyses on our log files and at 300MB a day
this would not be feasible unless we converted them to a binary format.
This conversion takes about 4 CPU hours each day. Having our server
dump a binary file in the first place would seem sensible.
The binary files that we produce have a number of advantages:
The files are smaller
Some fields can be enumerated types
IP addresses are just 4 bytes
Repeated strings are held in a separate strings table
They are faster to access
Binary data does not have to go through a conversion process
Timestamps in the file allow you to pin-point records
Searching, sorting and collating are orders of magnitude faster
The biggest disadvantage is the more complex logging procedure, but we
already have code to do that. Of course, they are not human-readable
either, but then who wants to read 300MB of log file each day.
Other sites must have similar problems with big logs, and although, as
a cache, I don't anticipate that passing round compressed ASCII log
extracts will be a big problem, a binary standard would ensure that
munging tools remain interoperable.
Thoughts?
Neil.
Received on Thursday, 16 May 1996 08:53:20 UTC