W3C home > Mailing lists > Public > public-webapps@w3.org > October to December 2008

Re: [widgets] Unicode Zip Paths (fully decomposed canonical form?)

From: Marcos Caceres <marcosscaceres@gmail.com>
Date: Sat, 6 Dec 2008 00:40:24 +0000
Message-ID: <b21a10670812051640x75cfba38v923e69fd0dcdb7cc@mail.gmail.com>
To: public-webapps <public-webapps@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Woops, by fully decomposed canonical form I think I ment
"Normalization Form D (NFD)" as defined in:

On Sat, Dec 6, 2008 at 12:31 AM, Marcos Caceres
<marcosscaceres@gmail.com> wrote:
> Hi, I'm trying to put the final touches on the zip section of the
> widget packaging spec [1] before we go to LC by the 10th and I've run
> into an i18n problem related to character encodings. I' wondering if
> anyone would be kind enough to give me some guidance as to what is
> going on, encoding wise, with in MacOS with regards to the encoding of
> file names in Zip Files?
> When I create a zip file with one file entry called "ñ", inside the
> zip file, the file name gets decomposed to the following (hex) byte
> sequence:
> ñ -> 0x6E 0xCC
> 6E is the letter "n" in Unicode, so there is obviously some
> decomposition going on there. But 0xCC in Unicode maps to Ì (LATIN
> CAPITAL LETTER I WITH GRAVE)? So I'm not sure what encoding the zip
> file is using.
> The reason I ask is because I'm not sure what to put into the widget
> spec in regards to recommending the use of canonical decomposition for
> unicode file names. Or even if that is a good idea!?
> Should I put the following into the spec?:
> "It is recommended that the file name field be encoded using [UTF-8]
> in fully decomposed canonical form."
> OR just:
> "It is recommended that the file name field be encoded using [UTF-8]."
> This seems important for when I go form MacOS to any other platform as
> file names get all mangled when files are extracted on any other
> platform. We obviously don't want that to happen so widget engines
> need to be prepared to deal with these encoding issues.
> I looked at the Zip spec [2], but I don't see any real guidance with
> regards to this. However, for those who know more about encoding, it
> would be helpful if you could also take a look at the Zip spec.
> Any help would be greatly appreciated,
> Marcos
> [1] http://dev.w3.org/2006/waf/widgets/#zip-relative
> [2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
> --
> Marcos Caceres
> http://datadriven.com.au

Marcos Caceres
Received on Saturday, 6 December 2008 00:41:00 UTC

This archive was generated by hypermail 2.3.1 : Friday, 27 October 2017 07:26:13 UTC