Re: Standard log file format - binary version?

On Thu, 16 May 1996, N.G.Smith wrote:

> What are people's thoughts on standardising on a binary version of the
> Extended Log File Format?

I think this needs to be sufficiently open so that the log can be fed
directly into a database, preferably relational, preferably chosen by the
site rather than by the server vendor. Not being a database person, I
don't know whether the SQL "standard" specifies how data types are stored or
leaves that to the vendors. If SQL data types are already standardized, I
would vote for that. Otherwise, some "meta" standard that would enable us
to use the database of our choice, plus an interchange standard for 
comparing different sites.

> We perform a number of analyses on our log files and at 300MB a day
> this would not be feasible unless we converted them to a binary format.
> This conversion takes about 4 CPU hours each day. Having our server
> dump a binary file in the first place would seem sensible.
> The binary files that we produce have a number of advantages:
>     The files are smaller
>         Some fields can be enumerated types
>         IP addresses are just 4 bytes
>         Repeated strings are held in a separate strings table
>     They are faster to access
>         Binary data does not have to go through a conversion process
>         Timestamps in the file allow you to pin-point records
>         Searching, sorting and collating are orders of magnitude faster
> The biggest disadvantage is the more complex logging procedure, but we
> already have code to do that. Of course, they are not human-readable
> either, but then who wants to read 300MB of log file each day.
> Other sites must have similar problems with big logs, and although, as
> a cache, I don't anticipate that passing round compressed ASCII log
> extracts will be a big problem, a binary standard would ensure that
> munging tools remain interoperable.

I think we're all headed in a similar direction, but we're looking at  
converting our logs to a database (probably mSQL) on a nightly basis. This 
will enable us index the important fields and use ordinary reporting and 
analysis tools to analyse them. Our problem is not so much the size of 
the logs as the time it takes to analyse them. 

I agree that the days of reading the logs by visual inspection are gone. 

James Calloway, General Manager                     http://www.nando.net
Nando.net, a McClatchy New Media company
127 W. Hargett St., Suite 406, Raleigh, NC 27601-1351
Voice: (919) 836-2858  FAX: (919) 829-8924

Follow-Ups: References: