W3C home > Mailing lists > Public > public-webapps@w3.org > October to December 2008

Re: [widgets] Unicode Zip Paths (fully decomposed canonical form?)

From: Marcos Caceres <marcosscaceres@gmail.com>
Date: Sat, 6 Dec 2008 00:40:24 +0000
Message-ID: <b21a10670812051640x75cfba38v923e69fd0dcdb7cc@mail.gmail.com>
To: public-webapps <public-webapps@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Woops, by fully decomposed canonical form I think I ment
"Normalization Form D (NFD)" as defined in:
http://www.unicode.org/reports/tr15/#Decomposition

On Sat, Dec 6, 2008 at 12:31 AM, Marcos Caceres
<marcosscaceres@gmail.com> wrote:
> Hi, I'm trying to put the final touches on the zip section of the
> widget packaging spec [1] before we go to LC by the 10th and I've run
> into an i18n problem related to character encodings. I' wondering if
> anyone would be kind enough to give me some guidance as to what is
> going on, encoding wise, with in MacOS with regards to the encoding of
> file names in Zip Files?
>
> When I create a zip file with one file entry called "ñ", inside the
> zip file, the file name gets decomposed to the following (hex) byte
> sequence:
>
> ñ -> 0x6E 0xCC
>
> 6E is the letter "n" in Unicode, so there is obviously some
> decomposition going on there. But 0xCC in Unicode maps to Ì (LATIN
> CAPITAL LETTER I WITH GRAVE)? So I'm not sure what encoding the zip
> file is using.
>
> The reason I ask is because I'm not sure what to put into the widget
> spec in regards to recommending the use of canonical decomposition for
> unicode file names. Or even if that is a good idea!?
>
> Should I put the following into the spec?:
> "It is recommended that the file name field be encoded using [UTF-8]
> in fully decomposed canonical form."
>
> OR just:
> "It is recommended that the file name field be encoded using [UTF-8]."
>
> This seems important for when I go form MacOS to any other platform as
> file names get all mangled when files are extracted on any other
> platform. We obviously don't want that to happen so widget engines
> need to be prepared to deal with these encoding issues.
>
> I looked at the Zip spec [2], but I don't see any real guidance with
> regards to this. However, for those who know more about encoding, it
> would be helpful if you could also take a look at the Zip spec.
>
> Any help would be greatly appreciated,
> Marcos
>
> [1] http://dev.w3.org/2006/waf/widgets/#zip-relative
> [2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
> --
> Marcos Caceres
> http://datadriven.com.au
>



-- 
Marcos Caceres
http://datadriven.com.au
Received on Saturday, 6 December 2008 00:41:00 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:28 GMT