- From: Marcos Caceres <marcosscaceres@gmail.com>
- Date: Sat, 6 Dec 2008 00:31:16 +0000
- To: public-webapps <public-webapps@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Hi, I'm trying to put the final touches on the zip section of the widget packaging spec [1] before we go to LC by the 10th and I've run into an i18n problem related to character encodings. I' wondering if anyone would be kind enough to give me some guidance as to what is going on, encoding wise, with in MacOS with regards to the encoding of file names in Zip Files? When I create a zip file with one file entry called "ñ", inside the zip file, the file name gets decomposed to the following (hex) byte sequence: ñ -> 0x6E 0xCC 6E is the letter "n" in Unicode, so there is obviously some decomposition going on there. But 0xCC in Unicode maps to Ì (LATIN CAPITAL LETTER I WITH GRAVE)? So I'm not sure what encoding the zip file is using. The reason I ask is because I'm not sure what to put into the widget spec in regards to recommending the use of canonical decomposition for unicode file names. Or even if that is a good idea!? Should I put the following into the spec?: "It is recommended that the file name field be encoded using [UTF-8] in fully decomposed canonical form." OR just: "It is recommended that the file name field be encoded using [UTF-8]." This seems important for when I go form MacOS to any other platform as file names get all mangled when files are extracted on any other platform. We obviously don't want that to happen so widget engines need to be prepared to deal with these encoding issues. I looked at the Zip spec [2], but I don't see any real guidance with regards to this. However, for those who know more about encoding, it would be helpful if you could also take a look at the Zip spec. Any help would be greatly appreciated, Marcos [1] http://dev.w3.org/2006/waf/widgets/#zip-relative [2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT -- Marcos Caceres http://datadriven.com.au
Received on Saturday, 6 December 2008 00:32:02 UTC