Re: [whatwg] Archive API - proposal from Andrea Marchesini on 2012-08-15 (public-whatwg-archive@w3.org from August 2012)

From: Andrea Marchesini <baku@mozilla.com>
Date: Wed, 15 Aug 2012 04:24:29 -0700 (PDT)
To: whatwg@whatwg.org
Message-ID: <1907766875.1749697.1345029869173.JavaMail.root@mozilla.com>

Thanks for your feedback.

When I was implementing the ArchiveAPI, my idea was to have a generic Archive API and not just a ZIP API.
Of course the current implementation supports just ZIP but in the future we could have support for more formats.

> This interface is problematic. Since ZIP files don't have a standard
> encoding, filenames in ZIPs are often garbage. This API requires
> that filenames round-trip uniquely, or else files aren't accessible
> t all. For example, if you have two filenames in CP932, "日" and "本",
> but the encoding isn't determined correctly, you may end up with two
> files both with a filename of "??". Either you can't open either
> file, or you can only open one of them. This isn't theoretical; I
> hit ZIP files like this in the wild regularly.

I agree. I was thinking that the default encoding for filenames is:
UTF-8. If filename is not a valid UTF-8 string we can use the caller-supplied encoding:

var reader = new ArchiveReader(blob, "Windows-1252");

If this fails, this filename/file will be excluded from the results.

> It should be possible to get the CRC32 of files, which ZIP stores in
> the central directory. This both allows the user to perform checksum
> verification himself if wanted, and all the other variously useful
> things about being able to get a file's checksum without having to
> read the whole file.

can we have 'generic' archive API supporting CRC32?

Andrea

Received on Wednesday, 15 August 2012 11:24:55 UTC