- From: Benjamin Franz <snowhare@netimages.com>
- Date: Wed, 23 Apr 1997 10:04:58 -0700 (PDT)
- To: Henrik Frystyk Nielsen <frystyk@w3.org>
- Cc: http-wg@cuckoo.hpl.hp.com
On Wed, 23 Apr 1997, Henrik Frystyk Nielsen wrote: > At 03:44 PM 4/22/97 -0700, Benjamin Franz wrote: > > >That is an *exceptionally large* HTML document - about 10 times the size > >of the average HTML document based on the results from our webcrawling > >robot here (N ~= 5,000 HTML documents found by webcrawling). Very few web > >designers would put that much on a single page because they are aiming for > >a target of 30-50K TOTAL for a page - including graphics. > > It would be interesting to elaborate a bit on getting a better impression > on what the distribution of web pages is. A sample of 5000 is not big > enough to put *'s around your conclusions. I know that there are many cache > maintainers and maybe even indexers on this mailing list. Benjamin, what if > you tried getting these people to take a snapshot of their caches and get > the sizes of the HTML pages? It would be very useful information to a lot > of us! Dammit, you are just begging for me to create a Squid store.log analysis tool. :) > >As noted: deflate and other compression schemes do much better on large > >text/* documents than small ones. Using an overly large document gives a > >misleading comparision against the short window compression that modems > >perform by basically allowing deflate a 'running start'. You should do the > >comparision using 3-4K HTML documents: The whole test document should be > >only 3-5K uncompressed and 1-2K compressed. > > I tried to do this with the page > > http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Compression/PPP.html > > which is 4312 uncompressed and 1759 compressed. It still gives a 30% > increase in speed and a 35% gain in packets. Below that size, the number of > TCP packets begin to be the same and therefore little difference is to be > expected. But remember this is _only_ on the text/* document portion of the traffic - which is itself only around 13% of the total traffic. So, basically you save 30% in time on 13% of the traffic - or about a net 3.9% savings. By your own figures this is even worse than the figures I gave - which estimated a net 7.5% savings (based on a much higher compression estimate of 57% for text/*). > Note, this is using default compression _including_ the dictionary. > Intelligent tricks can be played by making a pre-defined HTML-aware > dictionary in which case the win will be bigger. Even if it doubles the compression efficiency, you would not crack 8% net savings. > >Again, the document used was around 10 times the size of the typical HTML > >document. This should be re-done with more typical test documents. In > >fact, it would probably be a good idea to test multiple sizes of documents > >as well as realistic mixes of text/* and image/* to understand how > >document size and mix affect the results of compression and pipelining. > > My point here was that the size may not be that bad after all - considering > the effect of style sheets. As style sheets may be included in the HTML > document this may cause the overall size of HTML documents to increase. > Likewise, it will make a lot of graphics go away, as it gets replaced by > style sheets. I doubt it. The graphics load is not determined by things that stylesheets will affect ultimately: Designers would put more and higher quality graphics in than they do today if it wouldn't slow the load to unacceptable levels. Byte hungry designers will implement *external* stylesheets and scripting to get the cache win. They will then use the freed bytes will add *more* graphics and multi-media to do things stylesheets still can't. >From their perspective stylesheets are not an opportunity to reduce the overall byte count but rather to include more things they couldn't before because the byte count would be too high to be acceptable. If anything, you may actually see the HTML docs *shrink* while becoming harder to compress because of the ability to toss styling and scripting into seperate re-usable documents. This will reduce the win from compressing text/* even more. The process of web page design is usually a balance between the graphic artist who wants to cry when he is told that his machine has less than 128 meg of memory and 'only' 2 gig of hard drive space and the web designer who must brutalize the artist's work until it will fit into less than 30K. Make no mistake - the usual design process is forcing the size *down* to 30K - not *up* to it. -- Benjamin Franz
Received on Wednesday, 23 April 1997 10:06:24 UTC