Re: I18N issues for Widgets Spec [Was: Re: [Widgets] ASCII File names - request for comments] from Uma Umamaheswaran on 2007-12-03 (www-international@w3.org from October to December 2007)

From: Uma Umamaheswaran <umavs@ca.ibm.com>
Date: Mon, 3 Dec 2007 16:15:00 -0500
To: "Marcos Caceres" <marcosscaceres@gmail.com>
Cc: "Arthur Barstow" <art.barstow@nokia.com>, "Richard Ishida" <ishida@w3.org>, "public-appformats@w3.org" <public-appformats@w3.org>, public-i18n-core@w3.org, "Thomas Roessler" <tlr@w3.org>, www-international@w3.org, www-international-request@w3.org
Message-ID: <OF7290DF76.A1BB903B-ON852573A6.00746AE1-852573A6.0074BADE@ca.ibm.com>

<Feedback from one of the PC experts in IBM - Ken Borgendale  --
kwb@us.ibm.com >

It seems to me that the problem here is that MacOS has a non-conforming
implementation of zip.  My first suggestion would be to fix that problem.

On the other hand, there is a large amount of redundancy in the UTF-8
encoding and if you only need to distinguish between Cp437 and UTF-8 you
could determine the encoding correctly in almost all cases.  Any valid
UTF-8 sequence which is not ASCII7 has at least two adjacent byte >0x7F
with the final one > 0xBF.  The simple rule would be: if the string is
valid UTF-8, process it as UTF-8, otherwise as Cp437.
========

Best regards, Uma
V.S. UMAmaheswaran, Ph.D.
Globalization Centre of Competency, IBM Toronto Lab
A2/SZ8, 8200 Warden Avenue, Markham, ON, Canada, L6G1C7; +1 905 413 3474;
Fax:905 413 4682; TieLine 313-3474; email: umavs@ca.ibm.com

Received on Monday, 3 December 2007 21:15:23 UTC