[whatwg] HTML resource packages from Brett Zamir on 2010-08-04 (public-whatwg-archive@w3.org from August 2010)

From: Brett Zamir <brettz9@yahoo.com>
Date: Wed, 04 Aug 2010 10:48:53 +0800
Message-ID: <4C58D515.90002@yahoo.com>
  This is and was a great idea. A few points/questions:

1) I think it would be nice to see explicit confirmation in the spec 
that this works with offline caching.

2) Could data files such as .txt, .json, or .xml files be used as part 
of such a package as well?

3) Can XMLHttpRequest be made to reference such files and get them from 
the cache, and if so, when referencing only a zip in the packages 
attribute, can XMLHttpRequest access files in the zip not spelled out by 
a tag like <link/>? I think this would be quite powerful/avoid 
duplication, even if it adds functionality (like other HTML5 features) 
which would not be available to older browsers.

4) Could such a protocol also be made to accommodate profiles of 
packages, e.g., by a namespace being allowable somewhere for each package?

Thus, if a package is specified as say being under the XProc (XML 
Pipelining) namespace profile, the browser would know it could 
confidently look for a manifest file with a given name and act 
accordingly if the profile were eventually formalized through future 
specifications or implemented by general purpose scripting libraries or 
browser extensions, etc.

Another example would be if a file packaging format were referenced by a 
page, allowing, along with a set of files, a manifest format like METS 
to be specified and downloaded, describing a sitemap for a package of 
files (perhaps to be added immediately to the user's IndexedDB database, 
navigated Gopher-like, etc.) and then made navigable online or offline 
if the files were included in the zip, thus allowing a single HTTP 
request to download a whole site (e.g., if a site offered a collection 
of books).

And manifest files might be made to specify which files should be 
updated at a specific time independently of the package (e.g., checking 
periodically for an updated manifest file outside of a zip which could 
point to newer versions).

Note: the above is not asking browsers to implement any such additional 
complex functionality here and now; rather, it is just to allow for the 
possibility of automated discovery of package files having a particular 
structure (e.g., with specifically named manifest files to indicate how 
to interpret the package contents) by providing a programmatically 
accessible namespace for each package which could be unique per 
application and interpreted in particular ways, including by general 
purpose JavaScript libraries. This is not talking about adding 
namespaces to HTML itself, but rather for specifying package profiles.

Such extensibility would, as far as I can see it, allow for some very 
powerful declarative styles of programming in relation to handling of 
multiple files (whether resource files, data files, or complete pages), 
while piggybacking on the proposal's ability to minimize the HTTP 
requests needed to get them.

best wishes,
Brett


On 8/4/2010 8:31 AM, Justin Lebar wrote:
> We at Mozilla are hoping to ship HTML resource packages in Firefox 4,
> and we wanted to get the WhatWG's feedback on the feature.
>
> For the impatient, the spec is here:
>
>      http://people.mozilla.org/~jlebar/respkg/
>
> and the bug (complete with builds you can try and some preliminary
> performance numbers) is here:
>
>      https://bugzilla.mozilla.org/show_bug.cgi?id=529208
>
>
> You can think of resource packages as image spriting 2.0.  A page
> indicates in its<html>  element that it uses one or more resource
> packages (which are just zip files).  Then when that page requests a
> resource (be it an image, a css file, a script, or whatever), the
> browser first checks whether one of the packages contains the
> requested resource.  If so, the browser uses the resource out of the
> package instead of making a separate HTTP request for the resource.
>
> There's of course more detail than that, of course.  Hopefully it's
> (mostly) clear in the spec.
>
> I envision two classes of users of resource packages.  I'll call the
> first "resource-constrained developers".  These developers care about
> how fast their page is (who doesn't?), but can't spend weeks speeding
> up their page.  For these developers, resource packages are an easy
> way to make their pages faster without going through the pain of
> spriting their images and packaging their js/css.
>
> The other class of users are the resource-unconstrained developers;
> think Google or Facebook.  These developers have already put a huge
> amount of effort into making their pages fast, and a naive application
> of resource packages is unlikely to make them any faster.  But these
> developers may be able to use resource packages cleverly to gain
> speedups.  In particular, nobody (to my knowledge) currently sprites
> content images, such as the results of an image search.  A determined
> set of developers should be able to construct resource packages for
> image search results on the fly and save some HTTP requests.
>
>
> So we can avoid rehashing here the common objections to resource
> packages, here's a brief overview of the arguments I've heard against
> the feature and my responses.
>
> * Argument: Packaging isn't the way forward.  When you change one
> resource in a package you have to change the whole package and so the
> user has to re-download all the bits when most of what was in their
> cache would have been fine.
>
> This is of course correct, but we don't think it eliminates the
> utility of resource packages.  The resource-constrained developer is
> probably happy with anything which speeds up page loads, even if it's
> not optimal when one part of the page changes.  And the
> resource-unconstrained developer probably won't find resource packages
> too useful for non-dynamic content, so caching isn't an issue in that
> case.
>
> * Argument: We can already package things pretty well.  Mozilla should
> instead be focusing on improving caching (or something else).
>
> I'd contend that we don't package particularly well in general.  The
> Facebook homepage loads 100 separate resources on a cold cache, and
> they certainly care about speed.  But anyway, this is just one
> project.  We're also looking at caching.  :)
>
> * Argument: Isn't this subsumed by HTTP pipelining?
>
> Mostly.  But we can't turn on HTTP pipelining because transparent
> proxies break it.
>
> Resource packages have the further benefit that they allow page
> authors to explicitly set the order in which the UA will download the
> resources -- with pipelining, an important resource might get stuck
> behind a large, unimportant resource, while with resource packages,
> the UA always downloads resources in the order they appear in the zip
> file.
>
> Last, my understanding is that the HTTP pipeline isn't particularly
> deep, so perhaps resource packages fill the TCP pipe better on
> high-latency connections.  I haven't looked into this, though.
>
> * Argument: What about SPDY?
>
> I think SPDY should subsume resource packages.  But its deployment
> will require changes to both web clients and servers, so it will
> probably take a while after it's released before it's available on all
> web servers.  And we have no idea when to expect SPDY to be ready for
> production.  Resource packages, in contrast, are something we can have
> Right Now.
>
> Additionally, since resource packages are backwards-compatible -- a
> page which specifies resource packages should display just fine in a
> browser which doesn't support them -- we should be able to turn off
> resource packages in the future if we decide we don't want them
> anymore.
>
>
> We'd love to hear what you think of the specification and our implementation.
>
> -Justin
>
Received on Tuesday, 3 August 2010 19:48:53 UTC