- From: Fred Douglis <douglis@research.att.com>
- Date: Tue, 22 Apr 1997 09:58:39 -0400
- To: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
- Cc: http-wg@cuckoo.hpl.hp.com
> I have done a quick test on the content of our proxy cache: for each > directory, I have compared the output of > > cat * | wc > and > cat * | gzip | wc > > which is not a very rigorous test (since files in the cache contain the > HTTP header as well, and merging files before compression changes > the results a little bit) but gives the idea. > > The total byte count is as follows: > > Uncompressed: 316.407.346 > Compressed: 274.892.797 > > with a saving, due to compression, of approximately 13% . I suspect the > actual use of compression would result in lower performance since > most files are short and headers compress a lot, thus biasing my result > toward better performance. These results can be explained with the fact > that large matherial is generally in compressed form at the source > hence the additional compression is ineffective. Another way to look at this is that not only is "large" textual data, such as postscript, often compressed, but images are inherently compressed. Can you tell us what fraction of files in your cache are content-type image/* (and the like) as opposed to text? In any case, I agree with your conclusion, in the sense that no matter what the cause of the poor compression is, the end result is that compression will only do so much. An aside: does anyone know what the difference in compression will be between cat * | gzip and for i in *; gzip $i ? My guess is that by glomming everything together you are getting better compression than you would in practice, when each file is compressed distinctly, due to the adaptive algorithms -- here you may use data from file X to do a better job compressing Y. -- Fred Douglis MIME accepted douglis@research.att.com AT&T Labs - Research 908 582-3633 (office) 600 Mountain Ave., Rm. 2B-105 908 582-3063 (fax) Murray Hill, NJ 07974 http://www.research.att.com/~douglis/ As of 6/1/97: AT&T Labs - Research 180 Park Ave, Room A181 Florham Park, NJ 07932-0971 973-360-8775 (office) 973-360-8871 (fax)
Received on Tuesday, 22 April 1997 07:02:34 UTC