W3C home > Mailing lists > Public > www-international@w3.org > July to September 2005

RE: New article for REVIEW: Upgrading from language-specific legacy encoding to Unicode encoding

From: Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
Date: Wed, 14 Sep 2005 16:41:57 +0100
Message-ID: <418B7E44473AC34488C9E730D09FF3CF0517927C@bbcxue204.national.core.bbc.co.uk>
To: "Tex Texin" <tex@xencraft.com>, "Frank Yung-Fong Tang" <franktang@gmail.com>
Cc: "Jony Rosenne" <rosennej@qsm.co.il>, <www-international@w3.org>

I agree that it would be useful for Jony & Frank to submit FAQs in these areas: their expertise would be really beneficial, & to be honest, in our team we haven't actually encountered these issues (maybe we're missing something..)

Also, FAQs are supposed to be an overview. I feel that this level of detail would be off-putting & possibly, dare I say, unneccessary to teams considering upgrading to UTF-8, which is surely something to encourage?

Thank you for your comments & apologies for the delay in replying.


-----Original Message-----
From:	www-international-request@w3.org on behalf of Tex Texin
Sent:	Wed 8/24/2005 12:58
To:	Frank Yung-Fong Tang
Cc:	Jony Rosenne; www-international@w3.org
Subject:	Re: New article for REVIEW: Upgrading from language-specific legacy   encoding to Unicode encoding

I was going to make more or less the same comment, which is that conversion
from legacy encodings to unicode is a difficult but necessary subject.
It is large so should be a separate faq or faqs, and should cover many
encodings, not just bidi.

Any minute now, Richard is going to pipe up suggesting Joni submit a faq for
hebrew and Frank one for double-byte encoding conversions, so I'll preempt
him and suggest that as well. ;-)

Although we could use a treatise on these issues, I wonder if it would be
better to identify libraries or tools that do the job right and give users
appropriate choices. I muck around with iconv, ICU, perl, etc. and it is
very hard to know which tools will do the entire job correctly, and which do
the minimum, or are several versions behind.

For example, a convertor written for Unicode 2.0 would not take advantage of
the characters in Unicode 4.x.
It is correct in some sense and incorrect in other ways. Also, a pure
encoding convertor would not take into account the needs of the Web, and
perhaps issues of conversion to the bidi markup.

And which tools offer a choice when it comes to converting backslash to yen,
wan, etc. when used as currency?

Many users are confused by which conversions to use. e.g. When to use
Windows-1252 instead of iso 8859-1, or when to use big5-hkscs instead of
big-5, since often data is mislabeled?

I think the tools view or roadmap may be more important than the character
encoding details.

But yes, it is a topic definitely needing expansion.
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World


This e-mail (and any attachments) is confidential and may contain
personal views which are not the views of the BBC unless specifically
If you have received it in error, please delete it from your system. 
Do not use, copy or disclose the information in any way nor act in
reliance on it and notify the sender immediately. Please note that the
BBC monitors e-mails sent or received. 
Further communication will signify your consent to this.
Received on Wednesday, 14 September 2005 15:42:32 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:25 UTC