- From: James E. Calloway <jcallowa@nando.net>
- Date: Thu, 16 May 1996 11:50:41 -0400 (EDT)
- To: www-logging@w3.org
On Thu, 16 May 1996, N.G.Smith wrote: > What are people's thoughts on standardising on a binary version of the > Extended Log File Format? I think this needs to be sufficiently open so that the log can be fed directly into a database, preferably relational, preferably chosen by the site rather than by the server vendor. Not being a database person, I don't know whether the SQL "standard" specifies how data types are stored or leaves that to the vendors. If SQL data types are already standardized, I would vote for that. Otherwise, some "meta" standard that would enable us to use the database of our choice, plus an interchange standard for comparing different sites. > > We perform a number of analyses on our log files and at 300MB a day > this would not be feasible unless we converted them to a binary format. > This conversion takes about 4 CPU hours each day. Having our server > dump a binary file in the first place would seem sensible. > > The binary files that we produce have a number of advantages: > > The files are smaller > > Some fields can be enumerated types > IP addresses are just 4 bytes > Repeated strings are held in a separate strings table > > They are faster to access > > Binary data does not have to go through a conversion process > Timestamps in the file allow you to pin-point records > Searching, sorting and collating are orders of magnitude faster > > The biggest disadvantage is the more complex logging procedure, but we > already have code to do that. Of course, they are not human-readable > either, but then who wants to read 300MB of log file each day. > > Other sites must have similar problems with big logs, and although, as > a cache, I don't anticipate that passing round compressed ASCII log > extracts will be a big problem, a binary standard would ensure that > munging tools remain interoperable. I think we're all headed in a similar direction, but we're looking at converting our logs to a database (probably mSQL) on a nightly basis. This will enable us index the important fields and use ordinary reporting and analysis tools to analyse them. Our problem is not so much the size of the logs as the time it takes to analyse them. I agree that the days of reading the logs by visual inspection are gone. James Calloway, General Manager http://www.nando.net Nando.net, a McClatchy New Media company 127 W. Hargett St., Suite 406, Raleigh, NC 27601-1351 Voice: (919) 836-2858 FAX: (919) 829-8924
Received on Thursday, 16 May 1996 11:50:33 UTC