W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2013

[whatwg] Zip archives as first-class citizens

From: Anne van Kesteren <annevk@annevk.nl>
Date: Wed, 28 Aug 2013 14:32:49 +0100
Message-ID: <CADnb78jksF5RdfPK4Brcib=N-jYcf0YrO=o=fFz4ZLUqTgLUMg@mail.gmail.com>
To: WHATWG <whatwg@whatwg.org>
Cc: Jake Archibald <jakearchibald@google.com>, Yehuda Katz <wycats@gmail.com>, Sam Tobin-Hochstadt <samth@ccs.neu.edu>, David Herman <dherman@mozilla.com>, Alex Russell <slightlyoff@google.com>, Andrea Marchesini <baku@mozilla.com>, Jason Orendorff <jorendorff@mozilla.com>, Jonas Sicking <jonas@sicking.cc>
A couple of us have been toying around with the idea of making zip
archives first-class citizens on the web. What we want to support:

* Group a bunch of JavaScript files together in a single resource and
refer to them individually for upcoming JavaScript modules.
* Package a bunch of related resources together for a game or
applications (e.g. icons).
* Support self-contained packages, like Flash-ads or Flash-based games.

Using zip archives for this makes sense as it has broad tooling
support. To lower adoption cost no special configuration should be
needed. Existing zip archives should be able to fit right in.


The above means we need URLs for zip archives. That is:

  <img src="... test.zip ... image.gif">

should work. As well as

  <iframe src="... test.zip ... test.html"></iframe>

and test.html should be able to contain URLs that reference other
resources inside the zip archive.


We have thought of three approaches for zip URL design thus far:

* Using a sub-scheme (zip) with a zip-path (after !):
zip:http://www.example.org/zip!image.gif
* Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
* Using media fragments: http://www.example.org/zip#path=image.gif

High-level drawbacks:

* Sub-scheme: requires changing the URL syntax with both sub-scheme
and zip-path.
* Zip-path: requires changing the URL syntax.
* Fragments: fail to work well for URLs relative to a zip archive.

Fragments are conceptually the cleanest as the only part of a URL
that's supposed to depend on the Content-Type is the fragment.
However, if you want to link to an ID inside an HTML resource you'd
have to do #path=test.html&id=test which would require adding
knowledge to the HTML resource that it is contained in a zip archive
and have special processing based on that. And not just HTML, same
goes for CSS or JavaScript.

I'm not sure we need to consider sub-scheme if zip-path can work as
it's more complex and not very well thought out. E.g. imagine
view-source:zip:http://www.example.org/zip!test.html. (I hope we never
need to standardize view-source and that it can be restricted to the
address bar in browsers.)

zip-path makes zip archive packaging by far the easiest. If we use %!
as separator that would cause a network error in some existing
browsers (due to an illegal %), which means it's extensible there,
though not backwards compatible.

We'd adjust the URL parser to build a zip-path once %! is encountered.
And relative URLs would first look if there's a zip-path and work
against that, and use path otherwise.

Fetching would always use the path. If there's a zip-path and the
returned resource is not a zip archive it would cause a network error.


As for nested zip archives. Andrea suggested we should support this,
but that would require zip-path to be a sequence of paths. I think we
never went to allow relative URLs to escape the top-most zip archive.
But I suppose we could support in a way that

  %!test.zip!test.html

goes one level deeper. And "../image.gif" in test.html looks in the
enclosing zip. And "../../image.gif" in test.html looks in the
enclosing zip as well because it cannot ever be relative to the path,
only the zip-path.


-- 
http://annevankesteren.nl/
Received on Wednesday, 28 August 2013 13:33:16 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 28 August 2013 13:33:18 UTC