Re: Pipelining and compression effect on HTTP/1.1 proxies from Luigi Rizzo on 1997-04-22 (ietf-http-wg@w3.org from April to June 1997)

From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Date: Tue, 22 Apr 1997 15:58:56 +0200 (MET DST)
To: Fred Douglis <douglis@research.att.com>
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <199704221358.PAA19237@labinfo.iet.unipi.it>

> > I have done a quick test on the content of our proxy cache: for each
...
> > which is not a very rigorous test (since files in the cache contain the
...
> > with a saving, due to compression, of approximately 13% . I suspect the
> > actual use of compression would result in lower performance since
> > most files are short and headers compress a lot, thus biasing my result
> > toward better performance. These results can be explained with the fact
> > that large matherial is generally in compressed form at the source
...
> Another way to look at this is that not only is "large" textual data, such as 
> postscript, often compressed, but images are inherently compressed.  Can you 

of course. Potentially large matherial is most of the times
compressed (because of native format, or because the provider is trying
to save bandwidth).

> tell us what fraction of files in your cache are content-type image/*
> (and the like) as opposed to text?  

couting them now, they are about 73% over 17000 files (2/3 of the cache,
which is rather small)

> An aside: does anyone know what the difference in compression will be between
> 	cat * | gzip
> and
> 	for i in *; gzip $i   ?
> 
> My guess is that by glomming everything together you are getting better 
> compression than you would in practice, when each file is compressed 
> distinctly, due to the adaptive algorithms -- here you may use data from file 
> X to do a better job compressing Y.

generally speaking, this is correct.  In this specific case, however,
I suspect that the advantages are only achieved on the http headers
(which are stored with the body), since a large amount of data does
not really compress. And for compressing headers there are probably
more efficient ways (using tokens for the keywords, binary
representation of dates times and numbers, etc.

	Cheers
	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

Received on Tuesday, 22 April 1997 07:49:01 UTC