W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > September to December 1996

Re: HTTP Log files

From: <hallam@ai.mit.edu>
Date: Mon, 21 Oct 96 14:17:30 -0400
Message-Id: <9610211817.AA32693@etna.ai.mit.edu>
To: mogul@pa.dec.com
Cc: hallam@ai.mit.edu, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com

>Not if you're referring to the one that Paul Leach and I have
>been working on.  We're working on a heavily revised draft, but
>it still won't discuss log formats.  Phill Hallam has issued
>some drafts on ways for servers to ask proxies for their logs,
>and I would imagine that he has considered the costs associated
>with retrieving tens or hundreds of megabytes of log info like
>this; the hit-metering work that Paul and I am doing is designed
>to avoid this kind of thing.

I have indeed considered these problems. But I distinguish between 
several different uses of log files. The main point for a spec
is to have a format that  is a neutral interchange medium that
all servers might be expected to implement.

The Extended log file format does have provision for compression
by collapsing a series of entires with the same value to a single 
line beginning with a number to show the number of times it occurs.

I don't think that you can fairly claim that I have a "megabytes"
of data problem without accepting that your scheme has a worse
one. Rather than beginn a communication for every hit I parcel up 
the infomation to be exchanges and pass it in a single communication
at a cost of a single line of text per hit. In the "simple" exchange
protocol you have to create an entire message per hit - much
more expensive.


I would imagine that a large national cache with a trafic in the
tens of millions of hits per server per day would want to exchange
log information frequently. I would expect such a server to be 
keeping an in-memory index to the cache since without such an index
it would be unable to keep up with that level of load in any case.

Recording the number of hits would mean a single slot in the index
structure. Its probably not even an additional slot since I would 
expect the cache maintenance algorithm would require the same data. 
At a chosen time the proxy would simply traverse the list of servers
which had requested notification and walk down the tree of index
records for each one. 

If a more comprehensive exchange of information were required the
server would need to keep a per-hit record somewhere. I designed the
log file format so as to allow such a server to simply append 
information to the end of the log. Such a server would keep an
additional separate log for each subscribing server. The additional
effort required to do this is no more than twice the effort of 
current log keeping methods.

As to the issue of whether to support fast binary formats for 
logging I don't think that these should be standardized at this 
moment in time. The first priority is to have a common interchange 
standard so that there can be a hope of creating analysis tools.
A binary format with features such as k-d tree indicies and full
indexicallity would be nice. I don't think that the perl hackers
are likely to be able to implement such a scheme and the commercial
vendors are unlikely to be interested in it unless they can design
it themselves.


	Phill
Received on Monday, 21 October 1996 11:15:45 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 06:32:16 EDT