W3C home > Mailing lists > Public > ietf-http-wg@w3.org > April to June 2012

RE: Performance implications of Bundling and Minification on HTTP/1.1

From: Henrik Frystyk Nielsen <henrikn@microsoft.com>
Date: Mon, 25 Jun 2012 23:03:57 +0000
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: HTTP Working Group <ietf-http-wg@w3.org>, Howard Dierking <howard@microsoft.com>
Message-ID: <3605BA99C081B54EA9B65B3E33316AF7346F2282@SN2PRD0310MB396.namprd03.prod.outlook.com>

You are correct regarding the wording of where the scripts were from. What I should have said is that we looked at content that was from the same "site". This was to clarify that we were not trying to bundle scripts from many different sites as this likely would not be practical. The goal was to use a version of bundling and minification that could be applied immediately.

The compressed size only includes the content size, not any HTTP overhead. The way we got the data was serving the files through IIS with Fiddler as the client sending two requests: one without accept-encoding and one with accept-encoding. We then got the size as the content-lengths of the responses. That way, we were guaranteed to use the same compression algorithm in all cases.

As for comparing the minification vs. just bundling (concatenation), here is the data if we just concatenate the CSS content from huffingtonpost.com and then compressed them as one document. The size difference is still noticeable (75230 - 69803 = 5427 bytes):

Compressing 6 CSS individually:               77495 bytes
Compressing the minified CSS bundle:    69803 bytes
Compression the 6 concatenated files:  75230 bytes

We'll be giving out the raw data in a day or two so that you can have a look for yourself.

As to your general point, these are relatively high-traffic public sites that clearly have gone through some level of optimization yet still leaves a lot of room open for more to be done. While I do agree that tooling support is evolving I don't think it takes away from the larger point that we have to take both protocol and content into consideration when considering performance and that there are obvious things that can help a great deal simply using existing infrastructure.


-----Original Message-----
From: Bjoern Hoehrmann [mailto:derhoermi@gmx.net] 
Sent: Monday, June 25, 2012 3:00 PM
To: Henrik Frystyk Nielsen
Cc: HTTP Working Group; Howard Dierking
Subject: Re: Performance implications of Bundling and Minification on HTTP/1.1

* Henrik Frystyk Nielsen wrote:

"We only looked at CSS and JS coming from the same DNS domain (i.e. for digg.com we looked at anything under *.digg.com." As far as I can tell, http://digg.com/ does not load any scripts from digg.com, most digg- specific scripts seem to come from *.diggstatic.com instead. The same goes for many of the other examples, http://www.bbc.co.uk/ loads from *.bbcimg.co.uk and *.static.bbci.co.uk, http://www.huffingtonpost.com/ loads from *.huffpost.com (and a single one from *.huffingtonpost.com), and so on. At least for the BBC I am quite sure this has not changed in recent weeks, so it seems a different measure was used than the above.

I am not sure why the bundles are several kilobytes bigger than the sum of the individual sizes, it would seem to take about one byte to conca- tenate them (e.g., bloomberg.com has "js size (kb)" 408 and "js bundle size (kb)" 410, which would make for a kilobyte of difference even if there are rounding problems). If headers were counted, the size should go down, as the bundle comes with less header overhead.

"In addition to removing white space, minification typically shortens variable names and other identifiers and removes pieces that are not used." I do not think changing variable names is "typical" for minifi- cation. I am not sure what to look at due to the first issue above, but the minification savings seem unreasonable; as far as comment removal and white space normalization goes, the sites seem to mostly use mini- fied scripts already, but they might not try to shorten variable names, though the referenced blog posting explaining minification also doesn't really mention that as an optimization. For CSS the results are closer, but for, say, http://www.bbc.co.uk/ I get more like 13% minification savings than 26% as in the blog posting (results vary with browsers and other things, there are, for instance, "conditional comments" that hide some references to external content from some browsers).

So overall the trend seems right, but this isn't very reproducible. It's also somewhat disappointing to see that, for instance, digg.com loads an old version of jQuery, and various jQuery extensions, "jScrollPane", "AJAX Upload", "In-Field Label", and so on with full comments and unmi- nified when loading the front page. Last year I wrote a tool that strips several kilobytes http://bjoern.hoehrmann.de/pngwolf/ off google.com but I guess collectively we are not quite there where we'd care about that.

(My general impression is that these kinds of optimizations are underde- veloped because they are quite hard to automate, with HTTP compression early on there had been many bugs in browsers and intermediaries and caches in both, there were concerns about drive space costs and CPU con- sumption, nowadays, to pick the example of PNG optimization, there is a lack of "cooperation", tools that outperform my `pngwolf` tend to be scripts that use multiple optimization tools, `pngwolf` selects the best scanline filters, kzip/pngout has the best Deflate implementation as far as size is concerned, various tools compete in the area of selecting be- tween palette images and RGB/A images for losless recompression, there are tools to optimize Huffman tables that work very good but happen to be unfree and closed source so I can't link them into `pngwolf` and so on. And, more importantly, the tools go largely unused. Google's mod_- pagespeed for instance includes OptiPNG, but runs it at settings that favour compression throughput over compression ratio, mainly, I suspect, because amortizing the cost in form of some cache is hard to implement, with proper cache invalidation and whatever else might be needed. In my case of `pngwolf`, I was surprised even how trivial it was to come up with a scanline filter selection that outperforms the only known one, the one offered in the specification, and how even expensive commercial tools fail to use even trivial heuristics. Overall, there is a lack of tools to tell when your site is much slower/bigger/... than necessary, a problem that starts with HTML and CSS being hard to parse properly.)
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Monday, 25 June 2012 23:05:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 25 June 2012 23:05:37 GMT