- From: Benjamin Franz <snowhare@netimages.com>
- Date: Tue, 22 Apr 1997 15:44:56 -0700 (PDT)
- To: http-wg@cuckoo.hpl.hp.com
On Tue, 22 Apr 1997, Henrik Frystyk Nielsen wrote: > At 12:16 PM 4/22/97 -0700, Benjamin Franz wrote: > > >My figures on www.xmission.com (a large server with many different > >commercial and non-commercial residents) from a sample of 27 gigabytes of > >recent measured traffic indicates that only about 13% of the traffic is > >text/*. This slashes the potential savings to a mere 13% x 57.5% = 7.5% > >from compressing the text/* files. And this overlooks the fact that the > >majority of people browsing are doing so over modem links that *already* > >perform pretty good on the fly compression of the data flowing through > >them - thus reducing the potential savings to the end user from > >pre-compressing text/* to negligible. > > Figures showing (potentially lack of) savings using compression compared to > all other data formats are all very good, but is in fact not what our data > results are all about. > > 1) In typical browsing mode, the very first packet on a connection contains > an HTML page - the images are not requested until the HTML has arrived and > started being parsed. TCPs behavior over time is a non-linear function > where the first packet is much more expensive than the last. Therefore, it > is likely to be a win to concentrate our efforts on the first packet. This > is exactly what compressing HTML gives us. > > 2) Modem compression has on several occasions been indicated to have > "pretty good" performance. Our data show otherwise - but not explicitly. I > just made some simple tests of modem compression with and without deflated > data and the figures are compelling - gaining about 2/3 in both time and > packets when using deflate. Look at > > http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Compression/PPP.html > > for details. "Note that we only download the HTML page and not any of the following images. The size of the uncompressed HTML page is 42K HTML and the compressed was 11K. This means that we decrease the overall payload with about 31K or 73.8%. " That is an *exceptionally large* HTML document - about 10 times the size of the average HTML document based on the results from our webcrawling robot here (N ~= 5,000 HTML documents found by webcrawling). Very few web designers would put that much on a single page because they are aiming for a target of 30-50K TOTAL for a page - including graphics. As noted: deflate and other compression schemes do much better on large text/* documents than small ones. Using an overly large document gives a misleading comparision against the short window compression that modems perform by basically allowing deflate a 'running start'. You should do the comparision using 3-4K HTML documents: The whole test document should be only 3-5K uncompressed and 1-2K compressed. > Compression also helps on a LAN - see the figures at > > http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Compression/LAN.html > > 3) On the more speculative side, I don't consider the current composition > of data formats in caches being constant. The paper describes the potential > benefits of using style sheets and other data formats than the more > traditional gif and jpeg. Style sheets are just starting to be deployed and > it may change the contents significantly over the next 6 months. CSS1 style > style sheets compress just as well as HTML, so there is yet another point > counting for compression. Again, the document used was around 10 times the size of the typical HTML document. This should be re-done with more typical test documents. In fact, it would probably be a good idea to test multiple sizes of documents as well as realistic mixes of text/* and image/* to understand how document size and mix affect the results of compression and pipelining. > So, the _actual_data_ that we have now for the effect of compression seems > to indicate with little doubt that it is worth doing! No - it only indicates that is may be worth doing. Or may not. Your PPP and LAN tests were done using atypical input data - their results may be (probably are) atypical as well. -- Benjamin Franz
Received on Tuesday, 22 April 1997 15:46:30 UTC