Packaging resources in HTML files from Yaron Golan on 2009-07-04 (public-html@w3.org from July 2009)

From: Yaron Golan <yarong@xennexinc.com>
Date: Sat, 4 Jul 2009 17:50:38 +0800
To: <public-html@w3.org>
Message-ID: <002d01c9fc8c$e5daea00$b190be00$@com>
One of the most annoying problems I’ve had to deal with (other than
cross-browser compatibility) with web-based product development has been
page load time optimization. When we first completed the development of our
homepage it was full of references to an additional 60 resources which
included Javascript, CSS, and image files. All and all it took it about 7-8
seconds to load on screen (on a good day), and we were pretty unhappy. 7
seconds, I was told by our SEO expert, was your average window closing time
– basically that’s how fast a user closes the window if he or she fails to
see anything interesting or, in our case, anything at all.

I don’t quite know what brought me to thinking about this yesterday, but
somehow packaging came to mind. A significant draw on the load time had to
do with the browser having only a limited amount of threads used to download
resources (this is IE6 and 7, and FireFox 3, so it may have improved at this
point), each requiring a full cycle of HTTP requests. With Javascript and
CSS files, there was also a matter of compression (for those of us who
refuse to use mod_gzip, etc.). All we really had to do here was to set up an
incrementally deflating resource package that the browser would load from
the server – which would include all of this junk, and you’d basically be
looking at a couple of HTTP requests at best, both of which could be run in
parallel.

A package file could be designed to contain the following:
1. The index file – containing the list of URLs which this package file
resolves (what files are in here and need not be loaded directly)
2. Javascript files – in order of loading
3. CSS files – in order of loading
4. Imagery

Now, with a simple tag like <resource src=”…” />which would be easily added
to your whole site, you could basically resolve most of the requests to it.
Once the browser encounters this tag under <head> it would start loading it
(during its loading process) and could cache all files unpackaged like it
does when it grabs files today. Existing tags (like img, link and script)
could be resolved by pretty much matching the URL with the index file. A
simple application could build the package file for you, and one could use
LZH for the incremental compression. It’s really as simple as that!

Now, why is this good for the world? here’s several ideas:
• Browsers could load all resources in one quick request. One DNS
resolution, one socket open, one HTTP request. Easy.
• Files would always be compressed – No need for special settings on server
side, like mod_gzip or mod_deflate.
• Servers and proxies would have a significantly lower load, and could
easily cache one file here.
• Even if the server doesn’t support it (read: no resource tag) – the Mobile
industry could reduce their bandwidth costs – Mobile browsers could gather
all resource URL requests into one, which they would request from their
mobile provider (already saving some bandwidth). The provider would grab the
original set of resources, compress them on the fly, and send them over the
air. Images could be adapted to mobile device size, and all script and css
files would be reduced in size!
• Resource packages can be built specifically for a client – the identifying
server could then grab other versions of the package (you could have one for
mobile and one for normal browser, for instance) and return those, instead!
• No need to change your existing codebase — and this is a big one! Other
than adding the resource tag, everything else would work and this would
comfortably be backwards compatible (All your existing HTML are belong to
you!).

Some questions and answers:
• Do we only have one resource file? – No, you can have multiple, but don’t
hit the same brick wall again.
• What do I do with external imagery or dynamic sites? – You can have
several package files for the “basics” (your site template or frequently
used resources) and have everything else loaded normally. If the browser
supports it, there’s really no reason why not to!
• Do we need to store all resources in the file? — No, you can have some of
them in there and some of them accessed directly by your browser.
Thoughts? Ideas?

For quick reference, here’s a bunch of links that discuss the issue with
multiple HTTP requests and optimization:
•  14 rules for fast web pages
–http://www.skrenta.com/2007/05/14_rules_for_fast_web_pages_by_1.html
•  Optimizing Page Load Time –http://www.die.net/musings/page_load_time/
•  Combine Images to Save HTTP Requests
- http://www.websiteoptimization.com/speed/tweak/combine/

– Yaron
Received on Sunday, 5 July 2009 20:43:44 UTC