Re: ZIP archive API? from Glenn Maynard on 2013-05-07 (public-webapps@w3.org from April to June 2013)

From: Glenn Maynard <glenn@zewt.org>
Date: Mon, 6 May 2013 19:15:26 -0500
To: Eric U <ericu@google.com>
Cc: Robin Berjon <robin@w3.org>, Florian Bösch <pyalot@gmail.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <CABirCh_KGjgBKxaOSscoSPT_0kkhMkWZwJM+eYEzTsF3nY=THQ@mail.gmail.com>

On Mon, May 6, 2013 at 1:11 PM, Eric U <ericu@google.com> wrote:

> This came up a few years ago; Gregg Tavares explained in [1] that only
> /some/ zipfiles are streamable, and you don't know whether yours are
> or not until you've seen the whole file.
>
>      Eric
>
> [1]
> http://lists.w3.org/Archives/Public/public-webapps/2010AprJun/0362.html
>

The file format is streamable.  You can create files that follow the spec
that will fail when streaming, but you can also create files that follow
the spec that will fail when not streaming.  (The end of central directory
record sometimes has data after it, so you have to do a search; there's no
spec defining how far you have to search, so if you put too much data there
it'll start to fail.)  Those are both problems with the spec that would
have to be addressed.  I don't think there's any reason to support tar (and
it would significantly complicate the API, since tar *only* supports
streaming).

The bigger point here is that the ZIP appnote isn't enough.  It doesn't
define parsers or error handling.  This means that defining an API to
expose ZIPs isn't only a matter of defining an API, somebody will need to
spec the file format itself.  Also, the appnote isn't free, so this would
probably need to be a clean-room spec.  (However, it wouldn't need to
specify all of the features of the format, a huge number of which are never
used, only how to parse past them and ignore them.)

On Mon, May 6, 2013 at 1:42 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> > Another question to take into account here is whether this should only be
> > about zip. One of the limitations of zip archives is that they aren't
> > streamable. Without boiling the ocean, adding support for a streamable
> > format (which I don't think needs be more complex than tar) would be a
> big
> > plus.
>
> Indeed. This is IMO an argument for relying on libraries.
>

It's not.  ZIP has been around longer than PNG and JPEG; its only real
competors are tar.gz (which isn't useful here) and RAR (proprietary).  It's
not going away and there's no indication of a sudden influx of competing
file formats, any more than image formats.

That said, I don't know if a ZIP API is worthwhile.  I'd start lower level
here, and think about supporting inflating blobs.  That's the same
functionality any ZIP API will want, and it's the main part of the ZIP
format that you really don't want to have to do in script.  The surface
area is also far simpler: new InflatedBlob(compressedBlob)

I'm still hoping to see some performance numbers from the people
> arguing that we should add this to the platform. Without that I see
> little hope of getting enough browser vendors behind this.
>

I'm not aware of any optimized inflate implementation in JS to compare
against, and it's a complex algorithm, so nobody is likely to jump forward
to spend a lot of time implementing and heavily optimizing it just to show
how slow it is.  I've seen an implementation around somewhere, but it
didn't use typed arrays so it would need a lot of reworking to have any
meaning.

Every browser already has native inflate, though.

-- 
Glenn Maynard

Received on Tuesday, 7 May 2013 00:15:57 UTC