- From: Uma Umamaheswaran <umavs@ca.ibm.com>
- Date: Mon, 3 Dec 2007 16:15:00 -0500
- To: "Marcos Caceres" <marcosscaceres@gmail.com>
- Cc: "Arthur Barstow" <art.barstow@nokia.com>, "Richard Ishida" <ishida@w3.org>, "public-appformats@w3.org" <public-appformats@w3.org>, public-i18n-core@w3.org, "Thomas Roessler" <tlr@w3.org>, www-international@w3.org, www-international-request@w3.org
<Feedback from one of the PC experts in IBM - Ken Borgendale -- kwb@us.ibm.com > It seems to me that the problem here is that MacOS has a non-conforming implementation of zip. My first suggestion would be to fix that problem. On the other hand, there is a large amount of redundancy in the UTF-8 encoding and if you only need to distinguish between Cp437 and UTF-8 you could determine the encoding correctly in almost all cases. Any valid UTF-8 sequence which is not ASCII7 has at least two adjacent byte >0x7F with the final one > 0xBF. The simple rule would be: if the string is valid UTF-8, process it as UTF-8, otherwise as Cp437. ======== Best regards, Uma V.S. UMAmaheswaran, Ph.D. Globalization Centre of Competency, IBM Toronto Lab A2/SZ8, 8200 Warden Avenue, Markham, ON, Canada, L6G1C7; +1 905 413 3474; Fax:905 413 4682; TieLine 313-3474; email: umavs@ca.ibm.com
Received on Monday, 3 December 2007 21:15:25 UTC