- From: Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
- Date: Wed, 14 Sep 2005 16:41:57 +0100
- To: "Tex Texin" <tex@xencraft.com>, "Frank Yung-Fong Tang" <franktang@gmail.com>
- Cc: "Jony Rosenne" <rosennej@qsm.co.il>, <www-international@w3.org>
I agree that it would be useful for Jony & Frank to submit FAQs in these areas: their expertise would be really beneficial, & to be honest, in our team we haven't actually encountered these issues (maybe we're missing something..) Also, FAQs are supposed to be an overview. I feel that this level of detail would be off-putting & possibly, dare I say, unneccessary to teams considering upgrading to UTF-8, which is surely something to encourage? Thank you for your comments & apologies for the delay in replying. Deborah -----Original Message----- From: www-international-request@w3.org on behalf of Tex Texin Sent: Wed 8/24/2005 12:58 To: Frank Yung-Fong Tang Cc: Jony Rosenne; www-international@w3.org Subject: Re: New article for REVIEW: Upgrading from language-specific legacy encoding to Unicode encoding I was going to make more or less the same comment, which is that conversion from legacy encodings to unicode is a difficult but necessary subject. It is large so should be a separate faq or faqs, and should cover many encodings, not just bidi. Any minute now, Richard is going to pipe up suggesting Joni submit a faq for hebrew and Frank one for double-byte encoding conversions, so I'll preempt him and suggest that as well. ;-) Although we could use a treatise on these issues, I wonder if it would be better to identify libraries or tools that do the job right and give users appropriate choices. I muck around with iconv, ICU, perl, etc. and it is very hard to know which tools will do the entire job correctly, and which do the minimum, or are several versions behind. For example, a convertor written for Unicode 2.0 would not take advantage of the characters in Unicode 4.x. It is correct in some sense and incorrect in other ways. Also, a pure encoding convertor would not take into account the needs of the Web, and perhaps issues of conversion to the bidi markup. And which tools offer a choice when it comes to converting backslash to yen, wan, etc. when used as currency? Many users are confused by which conversions to use. e.g. When to use Windows-1252 instead of iso 8859-1, or when to use big5-hkscs instead of big-5, since often data is mislabeled? I think the tools view or roadmap may be more important than the character encoding details. But yes, it is a topic definitely needing expansion. -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World ------------------------------------------------------------- http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Received on Wednesday, 14 September 2005 15:42:32 UTC