[Prev][Next][Index][Thread]

Standard log file format - binary version?



What are people's thoughts on standardising on a binary version of the
Extended Log File Format?

We perform a number of analyses on our log files and at 300MB a day
this would not be feasible unless we converted them to a binary format.
This conversion takes about 4 CPU hours each day. Having our server
dump a binary file in the first place would seem sensible.

The binary files that we produce have a number of advantages:

    The files are smaller

        Some fields can be enumerated types
        IP addresses are just 4 bytes
        Repeated strings are held in a separate strings table

    They are faster to access

        Binary data does not have to go through a conversion process
        Timestamps in the file allow you to pin-point records
        Searching, sorting and collating are orders of magnitude faster

The biggest disadvantage is the more complex logging procedure, but we
already have code to do that. Of course, they are not human-readable
either, but then who wants to read 300MB of log file each day.

Other sites must have similar problems with big logs, and although, as
a cache, I don't anticipate that passing round compressed ASCII log
extracts will be a big problem, a binary standard would ensure that
munging tools remain interoperable.

Thoughts?

Neil.


Follow-Ups: