- From: Jonas Sicking <jonas@sicking.cc>
- Date: Wed, 15 Aug 2012 23:22:41 -0700
- To: Glenn Maynard <glenn@zewt.org>
- Cc: whatwg@whatwg.org, Andrea Marchesini <baku@mozilla.com>
On Wed, Aug 15, 2012 at 9:38 PM, Glenn Maynard <glenn@zewt.org> wrote: > On Wed, Aug 15, 2012 at 10:10 PM, Jonas Sicking <jonas@sicking.cc> wrote: >> >> Though I still think that we should support reading out specific files >> using a filename as a key. I think a common use-case for ArchiveReader >> is going to be web developers wanting to download a set of resources >> from their own website and wanting to use a .zip file as a way to get >> compression and packaging. In that case they can easily either ensure >> to stick with ASCII filenames, or encode the names in UTF8. > > > That's what this was for: > > > // For convenience, add "getter File? (DOMString name)" to FileList, to > find a file by name. This is equivalent > // to iterating through files[] and comparing .name. If no match is > found, return null. This could be a function > // instead of a getter. > var example_file2 = zipFile.files["file.txt"]; > if(example_file2 == null) { console.error("file.txt not found in ZIP"; > return; } > > I suppose a named getter isn't a great idea--you might have a filename > "length"--so a "zipFile.files.find('file.txt')" function is probably better. I definitely wouldn't want to use a getter. That runs into all sorts of problems and the syntactical wins are pretty small. >> One way we could support this would be to have a method which allows >> getting a list of meta-data about each entry. Probably together with >> the File object itself. So we could return an array of objects like: >> >> [ { >> rawName: <UInt8Array>, >> file: <File object>, >> crc32: <UInt8Array> >> }, >> { >> rawName: <UInt8Array>, >> file: <File object>, >> crc32: <UInt8Array> >> }, >> ... >> ] >> >> That way we can also leave out the crc from archive types that doesn't >> support it. > > This means exposing two objects per file. I'd prefer a single File-subclass > object per file, with any extra metadata put on the subclass. First of all, we're be talking about 5 vs. 6 objects per file entry: two ArrayBuffers, two ArrayBufferViews, one File and potentially one JS-object. Actually, in Gecko it's more like 8 vs. 9 objects once you start counting the C++ objects and their JS-wrappers. Second, at least in the Gecko engine, allocating the first 5 objects take about three orders of magnitude more time than allocating the JS-object. I'm also not a fan of sticking the crc32 on the File object itself since we don't actually know that that's the correct crc32 value. >> But I like this approach a lot of we can make it work. The main thing >> I'd be worried about, apart from the IO performance above, is if we >> can make it work for a larger set of archive formats. Like, can we >> make it work for .tar and .tar.gz? I think we couldn't but we would >> need to verify. > > It wouldn't handle it very well, but the original API wouldn't, either. In > both cases, the only way to find filenames in a TAR--whether it's to search > for one or to construct a list--is to scan through the whole file (and > decompress it all, for .tgz). Simply retrieving a list of filenames from a > large .tgz would thrash the user's disk and chew CPU. > > I don't think there's much use in supporting .tar, anyway. Even if you want > true streaming (which would be a different API anyway, since we're reading > from a Blob here), ZIP can do that too, by using the local file headers > instead of the central directory. The main argument that I could see is that the initial proposal allowed extracting files from a .tar.gz while only extracting up to the point of finding the file-to-be-extracted. As long as .getFileNames wasn't called. Which I'll grant isn't a huge benefit. / Jonas
Received on Thursday, 16 August 2012 06:23:41 UTC