Re: [whatwg] Archive API - proposal from Henri Sivonen on 2012-08-15 (public-whatwg-archive@w3.org from August 2012)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 15 Aug 2012 14:14:15 +0300
To: Glenn Maynard <glenn@zewt.org>
Cc: whatwg@whatwg.org, Andrea Marchesini <baku@mozilla.com>
Message-ID: <CAJQvAufH5NkORcSpMQ=18sMaNRqU-2K4Kj5Cf-PxFkH2VR_ktw@mail.gmail.com>

On Tue, Aug 14, 2012 at 11:20 PM, Glenn Maynard <glenn@zewt.org> wrote:
> On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini <baku@mozilla.com> wrote:
>
>> // The getFilenames handler receives a list of DOMString:
>> var handle = this.reader.getFile(this.result[i]);
>
> This interface is problematic.  Since ZIP files don't have a standard
> encoding, filenames in ZIPs are often garbage.  This API requires that
> filenames round-trip uniquely, or else files aren't accessible t all.

Indeed, in the case of zip files, file names themselves are dangerous
as handles that get past passed back and forth, so it seems like a
good idea to be able to extract the contents of a file inside the
archive without having to address the file by name.

As for the filenames, after an off-list discussion, I think the best
solution is that UTF-8 is tried first but the ArchiveReader
constructor takes an optional second argument that names a character
encoding from the Encoding Standard. This will be known as the
fallback encoding. If no fallback encoding is provided by the caller
of the constructor, "Windows-1252" is set as the fallback encoding.
When it ArchiveReader processes a filename from the zip archive, it
first tests if the byte string is a valid UTF-8 string. If it is, the
byte string is interpreted as UTF-8 when converting to UTF-16. If the
filename is not a valid UTF-8 string, it is decoded into UTF-16 using
the fallback encoding.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 15 August 2012 11:14:44 UTC