Re: Pipelining and compression effect on HTTP/1.1 proxies

On Tue, 22 Apr 1997, Henrik Frystyk Nielsen wrote:

> At 12:16 PM 4/22/97 -0700, Benjamin Franz wrote:
> 
> >My figures on www.xmission.com (a large server with many different
> >commercial and non-commercial residents) from a sample of 27 gigabytes of
> >recent measured traffic indicates that only about 13% of the traffic is
> >text/*.  This slashes the potential savings to a mere 13% x 57.5% = 7.5%
> >from compressing the text/* files. And this overlooks the fact that the
> >majority of people browsing are doing so over modem links that *already*
> >perform pretty good on the fly compression of the data flowing through
> >them - thus reducing the potential savings to the end user from
> >pre-compressing text/* to negligible.
> 
> Figures showing (potentially lack of) savings using compression compared to
> all other data formats are all very good, but is in fact not what our data
> results are all about.
> 
> 1) In typical browsing mode, the very first packet on a connection contains
> an HTML page - the images are not requested until the HTML has arrived and
> started being parsed. TCPs behavior over time is a non-linear function
> where the first packet is much more expensive than the last. Therefore, it
> is likely to be a win to concentrate our efforts on the first packet. This
> is exactly what compressing HTML gives us.
> 
> 2) Modem compression has on several occasions been indicated to have
> "pretty good" performance. Our data show otherwise - but not explicitly. I
> just made some simple tests of modem compression with and without deflated
> data and the figures are compelling - gaining about 2/3 in both time and
> packets when using deflate. Look at
> 
>    http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Compression/PPP.html
> 
> for details.

   "Note that we only download  the HTML page and not any of the following
    images. The size of the uncompressed HTML page is 42K HTML and the
    compressed was 11K. This means that we decrease the overall payload
    with about 31K or 73.8%. "

That is an *exceptionally large* HTML document - about 10 times the size
of the average HTML document based on the results from our webcrawling
robot here (N ~= 5,000 HTML documents found by webcrawling). Very few web
designers would put that much on a single page because they are aiming for
a target of 30-50K TOTAL for a page - including graphics.

As noted: deflate and other compression schemes do much better on large
text/* documents than small ones. Using an overly large document gives a
misleading comparision against the short window compression that modems
perform by basically allowing deflate a 'running start'. You should do the
comparision using 3-4K HTML documents: The whole test document should be
only 3-5K uncompressed and 1-2K compressed. 

> Compression also helps on a LAN - see the figures at
> 
>    http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Compression/LAN.html
> 
> 3) On the more speculative side, I don't consider the current composition
> of data formats in caches being constant. The paper describes the potential
> benefits of using style sheets and other data formats than the more
> traditional gif and jpeg. Style sheets are just starting to be deployed and
> it may change the contents significantly over the next 6 months. CSS1 style
> style sheets compress just as well as HTML, so there is yet another point
> counting for compression.

Again, the document used was around 10 times the size of the typical HTML
document. This should be re-done with more typical test documents. In
fact, it would probably be a good idea to test multiple sizes of documents
as well as realistic mixes of text/* and image/* to understand how
document size and mix affect the results of compression and pipelining.

> So, the _actual_data_ that we have now for the effect of compression seems
> to indicate with little doubt that it is worth doing!

No - it only indicates that is may be worth doing. Or may not. Your PPP
and LAN tests were done using atypical input data - their results may be
(probably are) atypical as well. 

-- 
Benjamin Franz

Received on Tuesday, 22 April 1997 15:46:30 UTC