- From: <hallam@ai.mit.edu>
- Date: Mon, 21 Oct 96 14:17:30 -0400
- To: mogul@pa.dec.com
- Cc: hallam@ai.mit.edu, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
>Not if you're referring to the one that Paul Leach and I have >been working on. We're working on a heavily revised draft, but >it still won't discuss log formats. Phill Hallam has issued >some drafts on ways for servers to ask proxies for their logs, >and I would imagine that he has considered the costs associated >with retrieving tens or hundreds of megabytes of log info like >this; the hit-metering work that Paul and I am doing is designed >to avoid this kind of thing. I have indeed considered these problems. But I distinguish between several different uses of log files. The main point for a spec is to have a format that is a neutral interchange medium that all servers might be expected to implement. The Extended log file format does have provision for compression by collapsing a series of entires with the same value to a single line beginning with a number to show the number of times it occurs. I don't think that you can fairly claim that I have a "megabytes" of data problem without accepting that your scheme has a worse one. Rather than beginn a communication for every hit I parcel up the infomation to be exchanges and pass it in a single communication at a cost of a single line of text per hit. In the "simple" exchange protocol you have to create an entire message per hit - much more expensive. I would imagine that a large national cache with a trafic in the tens of millions of hits per server per day would want to exchange log information frequently. I would expect such a server to be keeping an in-memory index to the cache since without such an index it would be unable to keep up with that level of load in any case. Recording the number of hits would mean a single slot in the index structure. Its probably not even an additional slot since I would expect the cache maintenance algorithm would require the same data. At a chosen time the proxy would simply traverse the list of servers which had requested notification and walk down the tree of index records for each one. If a more comprehensive exchange of information were required the server would need to keep a per-hit record somewhere. I designed the log file format so as to allow such a server to simply append information to the end of the log. Such a server would keep an additional separate log for each subscribing server. The additional effort required to do this is no more than twice the effort of current log keeping methods. As to the issue of whether to support fast binary formats for logging I don't think that these should be standardized at this moment in time. The first priority is to have a common interchange standard so that there can be a hope of creating analysis tools. A binary format with features such as k-d tree indicies and full indexicallity would be nice. I don't think that the perl hackers are likely to be able to implement such a scheme and the commercial vendors are unlikely to be interested in it unless they can design it themselves. Phill
Received on Monday, 21 October 1996 11:15:45 UTC