- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 15 Aug 2012 14:14:15 +0300
- To: Glenn Maynard <glenn@zewt.org>
- Cc: whatwg@whatwg.org, Andrea Marchesini <baku@mozilla.com>
On Tue, Aug 14, 2012 at 11:20 PM, Glenn Maynard <glenn@zewt.org> wrote: > On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini <baku@mozilla.com> wrote: > >> // The getFilenames handler receives a list of DOMString: >> var handle = this.reader.getFile(this.result[i]); > > This interface is problematic. Since ZIP files don't have a standard > encoding, filenames in ZIPs are often garbage. This API requires that > filenames round-trip uniquely, or else files aren't accessible t all. Indeed, in the case of zip files, file names themselves are dangerous as handles that get past passed back and forth, so it seems like a good idea to be able to extract the contents of a file inside the archive without having to address the file by name. As for the filenames, after an off-list discussion, I think the best solution is that UTF-8 is tried first but the ArchiveReader constructor takes an optional second argument that names a character encoding from the Encoding Standard. This will be known as the fallback encoding. If no fallback encoding is provided by the caller of the constructor, "Windows-1252" is set as the fallback encoding. When it ArchiveReader processes a filename from the zip archive, it first tests if the byte string is a valid UTF-8 string. If it is, the byte string is interpreted as UTF-8 when converting to UTF-16. If the filename is not a valid UTF-8 string, it is decoded into UTF-16 using the fallback encoding. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Wednesday, 15 August 2012 11:14:44 UTC