- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Tue, 20 Dec 94 16:00:32 PST
- To: Simon E Spero <ses@tipper.oit.unc.edu>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Do you have some more info about the data set you ran the traces on? I seem to recall you mentioning that the traces were taken from the California Election server over a couple of days? Do you have any information on the size of the data set, and the relative access frequencies of the various documents? Just before I received your message I sent this correction to the minutes of the BOF: This is wrong in several ways: Digital have collected some 9 gigabytes of log data for requested URL and the duration the connection was kept open. A paper analysing this data was presented at the World Wide Web Fall Conference held in October 1994, and can be found in the online proceedings. (1) For the 1994 California Election server, we collected several different kinds of data. We have full logs (including connection durations with 1-msec resolution), but these are nowhere near 9 GB. We also have a few full days of tcpdump traces; these come to 9 GB after compression. (For privacy reasons, we cannot provide these logs or traces to others.) (2) The paper given at the conference in October obviously could not have been based on this data (which was collected in early November!) It is based on simple experiments performed with modified clients and servers. To answer your questions a bit more specifically: The data set size varied over time. Before 8pm (PST) on election night, it was mostly "static" pages, such as the text of ballot propositions, candidate statements, campaign finances, photos of faces, etc. After 8pm, we started updating the actual returns every 5 minutes. (The "returns" pages were online before that, but were only stand-in pages, nothing useful.) I don't have the precise sizes handy, but I suppose I should get someone to do a "find" on that server now. I haven't gotten around to calculating file size distributions. To do this right, I should probably break it down by file type, or even further. For example, in order to draw bar graphs showing the results, we generated 101 different .gifs representing values from 0% to 100%. (See http://www.election.digital.com/e/returns/returns/page.html for an example.) We sure hoped that most people browsing these pages would end up with most of the bar-graph .gifs cached before too long! I looked at the logs for the busiest day, and the average number of bytes returned was about 2100-2200 (depending on whether you count 0-byte responses in the average). Of course, we knew when we constructed these pages that we were expecting a heavy load, and we tried to keep things fairly compact. FYI, not counting the tcpdump traces, we have about 1 gigabyte of log information, although this includes an awful lot of different things. Needless to say, we could be busy for years trying to figure out what it all means, although by then the Web will be quite different. -Jeff
Received on Tuesday, 20 December 1994 16:09:53 UTC