Re: Multipart or TAR archive/package support for all APIs (Performance and scalability) from Gregg Tavares on 2009-08-05 (public-webapps@w3.org from July to September 2009)

From: Gregg Tavares <gman@google.com>
Date: Wed, 5 Aug 2009 02:18:02 -0700
To: Sebastian Markbåge <sebastian@calyptus.eu>
Cc: public-webapps@w3.org
Message-ID: <de4bd3190908050218x6e25e055sb77597ee59f428ff@mail.gmail.com>
On Tue, Aug 4, 2009 at 12:15 PM, Sebastian Markbåge
<sebastian@calyptus.eu>wrote:

> There has been some talk about supporting packages/archives in web APIs.
>
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-July/021586.html
> http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0460.html
>
> --------------
>
> Why?
>
> The main purpose is performance because of overhead in opening several
> connections. While this could potentially be solved using HTTP pipelining
> there are several advantages to working with packages in single requests.
>
> - HTTP pipelining has various bugs in several servers and proxies.
> Therefore, it's disabled by default in most(?) current browsers and several
> proxies. If it's going to be usable it needs several specification changes
> and updates across the board.
>
> - Even if HTTP pipelining worked as expected, Keep-Alive connections
> require that servers keeps connections open for a certain timeout period.
> That can be detrimental to high performance servers. The solution is to set
> the timeout so low that the client may timeout during page load - making it
> worse than no pipelining.
>
> - By packaging small files as a single unit, you can gzip the entire
> package using Content-Encoding. That can have major bandwidth benefits
> compared to gzipping each file individually. (.tar.gz vs .gz.tar)
>
> - High performance servers can easily handle packaged data. It's quicker to
> read a large file as a single consecutive read than making lots of look ups
> and seeks to find lots of small files on disk.
>
> - Clients can cache the package as a single unit, giving clients the same
> boost on disk seeks, if a simple caching mechanism is used.
>
> - If it's ubiquitous - it's easier for authors to package and deploy
> widgets and client-side tools as single files.
>
> --------------
>
> How?
>
> My suggestion would be to define the fragment part of the URI for a certain
> multipart type. The fragment identifier denotes a certain file within the
> package. E.g. http://domain/archive#filename This is similar to fragment's
> use for rows in text/plain (rfc5147 <http://tools.ietf.org/html/rfc5147>),
> anchors in text/html (rfc2854 <http://www.ietf.org/rfc/rfc2854.txt>), etc.
>
> The idea is that you could reference a single file within an archive in any
> other web API. The UA would download the archive and load the file when it
> reaches a file with said identifier within that archive.
>
> The packaging format could be any existing format: application/tar (using
> filenames), multipart/form-data (using the name attribute in
> Content-Disposition part-header) or multipart/related (using Content-ID
> part-header). But it's probably good to settle on one.
>
> The identifier fragment can itself have an additional fragment when the
> inner mime type defines a special usage: <a
> href="archive#file.html#anchorname"> or any other place where you need a
> fragment to define behavior (SVG, XBL, etc). Multiple # should be fine
> according to the generic uri syntax (rfc3986<http://tools.ietf.org/html/rfc3986>).
> Does it break any other existing specs or implementations?
>
> --------------
>
> Compatibility?
>
> Additionally you could add an additional attribute to HTML5 and CSS for
> archive URLs. That way, compatible UAs can use the package, if supported,
> otherwise fallback to regular files. Perhaps you could use media types using
> nested mimes: <audio src="archive#audiofile" type="multipart/related;
> fragmenttype=audio/ogg" />
>
> Example usage:
>
> <img src="file.jpg" msrc="archive.tar#file.jpg" />
>
>
>> {
>
> background-image: url(file.jpg);
>
> background-image: murl(archive.tar#file.jpg);
>
> }
>
>
>> <script src="file.js" msrc="archive.tar#file.js" type="text/javascript" />
>
>
>> var img = new Image();
>
> img.msrc = "archive.tar#file.png";
>
>
>
> xhr.open("GET", "archive.tar#file.xml", true);
>
>
> -----------------
>
> The purpose of this suggestion is that it is a rather easy specification.
> It's a minor tweak that would open up many possibilities using existing
> tools. It may not be so minor for implementations though. I'd love to hear
> other suggestions on how to best to address this issue.
>

This is a neat idea but it doesn't appear to solve these use cases I see as
fairly common.

#1) I'm making a version of any medium to heavy flash app but using only
HTML5 standards audio, video, canvas.  I need lots of assets. I want to
start my app immediately, put up a loading progress bar while I download the
assets.

How does

<img src="archive#img1.jpg">
<img src="archive#img2.jpg">
<img src="archive#img3.jpg">

Get me info for a progress bar?

#2) I'm making a game where I want to download user content. The user makes
a character using some editor, online or offline, the character is put in an
archive with the user's images and other data.

How would the above suggestion let me download this archive and query what's
inside so I can use this user's data?

#3) I'm making WorldOfSpaceCraft in WebGL. Knowing that I need to download
LOTS of assets I make an archive file with low-poly lods and low-res
textures at the front of the archive and progressively more detailed ones
toward the end. As the archive is downloaded I need access to the files as
they become available without having to wait for the entire archive to
download.

That's why I made the previous proposal for more programmatic interface.
It's not mutual exclusive to this one but it does seem like it has different
uses that this one does not cover.
Received on Wednesday, 5 August 2009 09:18:47 UTC