- From: Glenn Maynard <glenn@zewt.org>
- Date: Tue, 14 Aug 2012 15:20:48 -0500
- To: Andrea Marchesini <baku@mozilla.com>
- Cc: whatwg@whatwg.org
(I've reordered my responses to give a more logical progression.) On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini <baku@mozilla.com> wrote: > // The getFilenames handler receives a list of DOMString: > var handle = this.reader.getFile(this.result[i]); > This interface is problematic. Since ZIP files don't have a standard encoding, filenames in ZIPs are often garbage. This API requires that filenames round-trip uniquely, or else files aren't accessible t all. For example, if you have two filenames in CP932, "日" and "本", but the encoding isn't determined correctly, you may end up with two files both with a filename of "??". Either you can't open either file, or you can only open one of them. This isn't theoretical; I hit ZIP files like this in the wild regularly. Instead, I'd recommend that the primary API simply returns File objects directly from the ZIP. For example: var reader = archive.getFiles(); reader.onsuccess = function(result) { // result = [File, File, File, File...]; console.log(result[0].name); // read the file new FileReader(result[0]); } This allows opening files without any dependency on the filename. Since File objects are by design lightweight--no decompression should happen until you actually read from the file--this isn't expensive and won't perform any extra I/O. All the information you need to expose a File object is in the central directory (filename, mtime, decompressed size). I would like to receive feedback about this.. In particular: > . Do you think it can be useful? > . Do you see any limitation, any feature missing? > It should be possible to get the CRC32 of files, which ZIP stores in the central directory. This both allows the user to perform checksum verification himself if wanted, and all the other variously useful things about being able to get a file's checksum without having to read the whole file. (I don't think CRC32 checks should be performed automatically, since it's too hard for that to make sense when random access is involved.) // The ArchiveReader object works with Blob objects: > var archiveReader = new ArchiveReader(file); > > // Any request is asynchronous: > The only operation that needs to be asynchronous is creating the ArchiveReader itself. It should parse the ZIP central record before before returning a result. Once you've done that you can do the rest synchronously, because no further I/O is necessary until you actually read data from a file. This gives the following, simpler interface: var opener = new ZipOpener(file); opener.onerror = function() { console.error("Loading failed"); } opener.onsuccess = function(zipFile) { // .files is a FileList, representing each file in the archive. if(zipFile.files.length == 0) { console.error("ZIP file is empty"); return; } var example_file = zipFile.files[0]; console.log("The first filename is", example_file.name, "with an expected CRC of", example_file.expectedCRC); // Read from the file: var reader = new FileReader(example_file); // For convenience, add "getter File? (DOMString name)" to FileList, to find a file by name. This is equivalent // to iterating through files[] and comparing .name. If no match is found, return null. This could be a function // instead of a getter. var example_file2 = zipFile.files["file.txt"]; if(example_file2 == null) { console.error("file.txt not found in ZIP"; return; } } (To fit expectedCRC in there, it would actually need to use a subclass of File, not File itself.) This also eliminates an error condition (no getFile error callback), and since .files looks just like HTMLInputElement.files, it can be used directly with code written for it. For example, if you have a function "uploadAllFiles(files)", you can pass in both an <input type=file multiple>'s .input or a zipFile.files, and they'll both work. -- Glenn Maynard
Received on Tuesday, 14 August 2012 20:21:22 UTC