- From: Marcos Caceres <marcosscaceres@gmail.com>
- Date: Fri, 30 Nov 2007 22:29:16 +1000
- To: "Bjoern Hoehrmann" <derhoermi@gmx.net>
- Cc: www-international@w3.org, public-appformats@w3.org
Hi Bjoern, On Nov 30, 2007 10:14 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote: > * Marcos Caceres wrote: > >The WAF Working group is seeking assistance with an i18n problem we > >are having with our Widgets 1.0 specification [1]. The issue we are > >having is to do with determining the encoding of file names within Zip > >archives. Here is an overview of the problem: > > This is really an issue with the "ZIP" specification and deployed soft- > ware, not with the "Widgets" specification. It does not seem useful to > say anything about this in the Widgets specification beyond saying the > archive should be created in accordance with the ZIP specification and > that there may be interoperability issues with using non-ASCII names, > so those should be avoided, which should be quite normal for authors. I'm totally ok with doing that... I guess as long as it won't raise any issues later because we didn't really provide a solution to the problem. Would this be ok with the i18n community? (ie. make it Zip/implementer's problem) . > >The main problem is that there is no way, AFAIK, to determine if the > >encoding of a file names inside a Zip archive when you hit any bytes > >that are beyond the ASCII range (could be either cp437 or UTF-8?). > > I would not be surprised if there are actually more options than this, > but it's fairly easy to distinguish these two encodings for file names > since it is rather difficult to create a sequence of octets that is > valid UTF-8 and represents a reasonable file name in UTF-8 and CP437. > A heuristic could simply go like this: > > if decode_cp437(input) is a reasonable file name or > input is not valid utf-8, then use cp437; > else use utf8; > > Reasonable file names do not include box drawings, unassigned code > points, or mathematical symbols outside the ASCII range. But as above, > it's not really an issue for the Widgets specification, and authors > are best off if they avoid non-ASCII names. Ok, thanks for the tip. Kind regards, Marcos -- Marcos Caceres http://datadriven.com.au
Received on Friday, 30 November 2007 12:29:48 UTC