Re: [widgets] Unicode Zip Paths (fully decomposed canonical form?)

Woops, by fully decomposed canonical form I think I ment
"Normalization Form D (NFD)" as defined in:

On Sat, Dec 6, 2008 at 12:31 AM, Marcos Caceres
<> wrote:
> Hi, I'm trying to put the final touches on the zip section of the
> widget packaging spec [1] before we go to LC by the 10th and I've run
> into an i18n problem related to character encodings. I' wondering if
> anyone would be kind enough to give me some guidance as to what is
> going on, encoding wise, with in MacOS with regards to the encoding of
> file names in Zip Files?
> When I create a zip file with one file entry called "ñ", inside the
> zip file, the file name gets decomposed to the following (hex) byte
> sequence:
> ñ -> 0x6E 0xCC
> 6E is the letter "n" in Unicode, so there is obviously some
> decomposition going on there. But 0xCC in Unicode maps to Ì (LATIN
> CAPITAL LETTER I WITH GRAVE)? So I'm not sure what encoding the zip
> file is using.
> The reason I ask is because I'm not sure what to put into the widget
> spec in regards to recommending the use of canonical decomposition for
> unicode file names. Or even if that is a good idea!?
> Should I put the following into the spec?:
> "It is recommended that the file name field be encoded using [UTF-8]
> in fully decomposed canonical form."
> OR just:
> "It is recommended that the file name field be encoded using [UTF-8]."
> This seems important for when I go form MacOS to any other platform as
> file names get all mangled when files are extracted on any other
> platform. We obviously don't want that to happen so widget engines
> need to be prepared to deal with these encoding issues.
> I looked at the Zip spec [2], but I don't see any real guidance with
> regards to this. However, for those who know more about encoding, it
> would be helpful if you could also take a look at the Zip spec.
> Any help would be greatly appreciated,
> Marcos
> [1]
> [2]
> --
> Marcos Caceres

Marcos Caceres

Received on Saturday, 6 December 2008 00:41:07 UTC