- From: Tobie Langel <tobie.langel@gmail.com>
- Date: Tue, 14 Aug 2012 22:05:27 +0100
- To: Glenn Maynard <glenn@zewt.org>
- Cc: "whatwg@whatwg.org" <whatwg@whatwg.org>, Andrea Marchesini <baku@mozilla.com>
On Aug 14, 2012, at 21:21, Glenn Maynard <glenn@zewt.org> wrote: > (I've reordered my responses to give a more logical progression.) > > On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini <baku@mozilla.com> wrote: > >> // The getFilenames handler receives a list of DOMString: >> var handle = this.reader.getFile(this.result[i]); >> > > This interface is problematic. Since ZIP files don't have a standard > encoding, filenames in ZIPs are often garbage. This API requires that > filenames round-trip uniquely, or else files aren't accessible t all. For > example, if you have two filenames in CP932, "日" and "本", but the encoding > isn't determined correctly, you may end up with two files both with a > filename of "??". Either you can't open either file, or you can only open > one of them. This isn't theoretical; I hit ZIP files like this in the wild > regularly. > > Instead, I'd recommend that the primary API simply returns File objects > directly from the ZIP. For example: > > var reader = archive.getFiles(); > reader.onsuccess = function(result) { > // result = [File, File, File, File...]; > > console.log(result[0].name); > // read the file > new FileReader(result[0]); > } > > This allows opening files without any dependency on the filename. Since > File objects are by design lightweight--no decompression should happen > until you actually read from the file--this isn't expensive and won't > perform any extra I/O. All the information you need to expose a File > object is in the central directory (filename, mtime, decompressed size). > > I would like to receive feedback about this.. In particular: >> . Do you think it can be useful? >> . Do you see any limitation, any feature missing? >> > > It should be possible to get the CRC32 of files, which ZIP stores in the > central directory. This both allows the user to perform checksum > verification himself if wanted, and all the other variously useful things > about being able to get a file's checksum without having to read the whole > file. > > (I don't think CRC32 checks should be performed automatically, since it's > too hard for that to make sense when random access is involved.) > > // The ArchiveReader object works with Blob objects: >> var archiveReader = new ArchiveReader(file); >> >> // Any request is asynchronous: >> > > The only operation that needs to be asynchronous is creating the > ArchiveReader itself. It should parse the ZIP central record before before > returning a result. Once you've done that you can do the rest > synchronously, because no further I/O is necessary until you actually read > data from a file. > > This gives the following, simpler interface: > > var opener = new ZipOpener(file); > opener.onerror = function() { console.error("Loading failed"); } > opener.onsuccess = function(zipFile) > { > // .files is a FileList, representing each file in the archive. > if(zipFile.files.length == 0) { console.error("ZIP file is empty"); > return; } > > var example_file = zipFile.files[0]; > console.log("The first filename is", example_file.name, "with an > expected CRC of", example_file.expectedCRC); > > // Read from the file: > var reader = new FileReader(example_file); > > // For convenience, add "getter File? (DOMString name)" to FileList, to > find a file by name. This is equivalent > // to iterating through files[] and comparing .name. If no match is > found, return null. This could be a function > // instead of a getter. > var example_file2 = zipFile.files["file.txt"]; > if(example_file2 == null) { console.error("file.txt not found in ZIP"; > return; } > } > > (To fit expectedCRC in there, it would actually need to use a subclass of > File, not File itself.) > > This also eliminates an error condition (no getFile error callback), and > since .files looks just like HTMLInputElement.files, it can be used > directly with code written for it. For example, if you have a function > "uploadAllFiles(files)", you can pass in both an <input type=file > multiple>'s .input or a zipFile.files, and they'll both work. How are nested directories handled in your counter proposal? --tobie
Received on Tuesday, 14 August 2012 21:09:05 UTC