W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2013

Re: [whatwg] Zip archives as first-class citizens

From: Alexandre Morgaut <Alexandre.Morgaut@4d.com>
Date: Thu, 29 Aug 2013 14:15:19 +0200
To: Anne van Kesteren <annevk@annevk.nl>, WHATWG <whatwg@whatwg.org>
Message-ID: <DE55B4C4-6983-44DD-8C44-5E552259D8A4@4d.com>
Regarding fallback:
- when zip files are blocked by a firewall
- when light browsers don't have enough resource to load a big zip file
- when browser don't support a specific package format (jar, tgz...)

In the solution mentioned with <link> tag and a "pack:" like scheme

<link id="pack1" rel="package" src="/pack1.zip" type="application/zip" title="My app global resources">

...

<img src="pack:pack1/graph1.png">


The browser could probably try an alternative url, per resource originally meant to be packed, like

GET /pack1.zip/graph1.png
Accept: image/*

It would be up to the server using packages to make this alternative link available
And it would still work with media fragments using the query form

GET /pack1.zip/graph1.png?xywh=160,120,320,240




On 29 août 2013, at 12:02, Alexandre Morgaut wrote:

> Such concept make me think to something I already saw long time ago:
> - http://limi.net/articles/resource-packages/

> - http://unscriptable.com/2010/08/03/firefoxs-proposed-resource-packages-spec-sucks/

> - http://people.mozilla.com/~jlebar/respkg/

>
> I liked the possibility to declare link resources to refer to, you were potentially able to load multiple archives depending of the layer logic of the app/site the same way you would do with image sprites or concatenated JS files
> ex:
> - framework layer
> - app / site layer
> - interface / page layer
> Using the "packages" attribute made in my opinion this granularity still possible but harder
> With <link> tags you were able to put distinct ones in HTML templates while you can't with the "packages" attribute
>
>
> Regarding internal links there is an existing scheme used in MHTML: "cid:" and "mid:"
> - http://tools.ietf.org/html/rfc2557

> - http://tools.ietf.org/html/rfc2392

> (note: a scheme to consider for the URL API?)
> But they may not be the best one to use
>
> I looked also at the "res:" microsoft scheme but their use of "#" to target sub resource is an issue preventing good support of media fragments
> It also has a bad history with security issues
> - http://msdn.microsoft.com/en-us/library/aa767740(v=vs.85).aspx
> - http://support.microsoft.com/kb/220830

> - http://ha.ckers.org/blog/20070721/res-protocol-local-file-enumeration/

>
> Something I'd find interesting would be
>
> have a "package" or "resources"  link type able to link to few standard archive formats (zip, jar, tar, tgz...) to be decided (gz already standard in HTTP)
>
> ex:
>
> <link id="pack1" rel="package" src="/pack1.zip" type="application/zip" title="My app global resources">
>
> and then be able to refer to any of the package file via the link tag id using a global package specific scheme
>
> ex:
> <img src="pack:pack1/graph1.png">
>
> Media fragments should still work on such URL but it might be required, or at least recommended, to specify the MIME type in the tag as it won't be provided by HTTP. Of course the browser can still automatically define it from the extension if not specified as the HTTP server does
>
> <img src="pack:pack1/graph1.png#xywh=160,120,320,240">
>
> or a bit safer
>
> <img src="pack:pack1/graph1.png#xywh=160,120,320,240" type="image/png">
>
> If we don't to create a new scheme,
> - the "cid:" one may be used considering the Content-ID as defined by the link id + the sub resource path
>
> <img src="cid:pack1/graph1.png">
>
> - the "mid:" one may be used considering the Message-ID as defined by the link id and the Content-ID as the sub resource path
>
> <img src="mid:pack1/graph1.png">
>
> It would mean creating a RFC updating the 2392 RFC instead of creating a new one from scratch
> -> an issue being that "cid:" and "mid: "don't expect fragments and I think that nothing prevent a Content-ID to contain a "#" in a MIME mail or MHTML file, so the URL API may have little more complex check to do to correctly fill the URL object properties (current context, existing package link in header...)
>
> For such solution the link "rel" attribute value and the URL scheme name should meet a global adoption. I think that we should try to make them much related to each other if possible
>
>
> Another point I found interesting to think of:
>
> The same way debuggers like Web Inspector allow to inspect cookies and web storages, they should be able to list packages, list them by id used as keys and show:
> - their size
> - their archive format
> - their description (from the "title" attribute)
> - the list of their files (preferably supporting internal folder hierarchy)
>
>
>
>
> On 28 août 2013, at 15:32, Anne van Kesteren wrote:
>
>> A couple of us have been toying around with the idea of making zip
>> archives first-class citizens on the web. What we want to support:
>>
>> * Group a bunch of JavaScript files together in a single resource and
>> refer to them individually for upcoming JavaScript modules.
>> * Package a bunch of related resources together for a game or
>> applications (e.g. icons).
>> * Support self-contained packages, like Flash-ads or Flash-based games.
>>
>> Using zip archives for this makes sense as it has broad tooling
>> support. To lower adoption cost no special configuration should be
>> needed. Existing zip archives should be able to fit right in.
>>
>>
>> The above means we need URLs for zip archives. That is:
>>
>> <img src="... test.zip ... image.gif">
>>
>> should work. As well as
>>
>> <iframe src="... test.zip ... test.html"></iframe>
>>
>> and test.html should be able to contain URLs that reference other
>> resources inside the zip archive.
>>
>>
>> We have thought of three approaches for zip URL design thus far:
>>
>> * Using a sub-scheme (zip) with a zip-path (after !):
>> zip:http://www.example.org/zip!image.gif

>> * Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif

>> * Using media fragments: http://www.example.org/zip#path=image.gif

>>
>> High-level drawbacks:
>>
>> * Sub-scheme: requires changing the URL syntax with both sub-scheme
>> and zip-path.
>> * Zip-path: requires changing the URL syntax.
>> * Fragments: fail to work well for URLs relative to a zip archive.
>>
>> Fragments are conceptually the cleanest as the only part of a URL
>> that's supposed to depend on the Content-Type is the fragment.
>> However, if you want to link to an ID inside an HTML resource you'd
>> have to do #path=test.html&id=test which would require adding
>> knowledge to the HTML resource that it is contained in a zip archive
>> and have special processing based on that. And not just HTML, same
>> goes for CSS or JavaScript.
>>
>> I'm not sure we need to consider sub-scheme if zip-path can work as
>> it's more complex and not very well thought out. E.g. imagine
>> view-source:zip:http://www.example.org/zip!test.html. (I hope we never
>> need to standardize view-source and that it can be restricted to the
>> address bar in browsers.)
>>
>> zip-path makes zip archive packaging by far the easiest. If we use %!
>> as separator that would cause a network error in some existing
>> browsers (due to an illegal %), which means it's extensible there,
>> though not backwards compatible.
>>
>> We'd adjust the URL parser to build a zip-path once %! is encountered.
>> And relative URLs would first look if there's a zip-path and work
>> against that, and use path otherwise.
>>
>> Fetching would always use the path. If there's a zip-path and the
>> returned resource is not a zip archive it would cause a network error.
>>
>>
>> As for nested zip archives. Andrea suggested we should support this,
>> but that would require zip-path to be a sequence of paths. I think we
>> never went to allow relative URLs to escape the top-most zip archive.
>> But I suppose we could support in a way that
>>
>> %!test.zip!test.html
>>
>> goes one level deeper. And "../image.gif" in test.html looks in the
>> enclosing zip. And "../../image.gif" in test.html looks in the
>> enclosing zip as well because it cannot ever be relative to the path,
>> only the zip-path.
>>
>>
>> --
>> http://annevankesteren.nl/

>





Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :    Alexandre.Morgaut@4d.com
Web :      www.4D.com


Received on Thursday, 29 August 2013 12:21:12 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 August 2013 12:21:12 UTC