Re: New article for REVIEW: Upgrading from language-specific legacy encoding to Unicode encoding

I was going to make more or less the same comment, which is that conversion
from legacy encodings to unicode is a difficult but necessary subject.
It is large so should be a separate faq or faqs, and should cover many
encodings, not just bidi.

Any minute now, Richard is going to pipe up suggesting Joni submit a faq for
hebrew and Frank one for double-byte encoding conversions, so I'll preempt
him and suggest that as well. ;-)

Although we could use a treatise on these issues, I wonder if it would be
better to identify libraries or tools that do the job right and give users
appropriate choices. I muck around with iconv, ICU, perl, etc. and it is
very hard to know which tools will do the entire job correctly, and which do
the minimum, or are several versions behind.

For example, a convertor written for Unicode 2.0 would not take advantage of
the characters in Unicode 4.x.
It is correct in some sense and incorrect in other ways. Also, a pure
encoding convertor would not take into account the needs of the Web, and
perhaps issues of conversion to the bidi markup.

And which tools offer a choice when it comes to converting backslash to yen,
wan, etc. when used as currency?

Many users are confused by which conversions to use. e.g. When to use
Windows-1252 instead of iso 8859-1, or when to use big5-hkscs instead of
big-5, since often data is mislabeled?

I think the tools view or roadmap may be more important than the character
encoding details.

But yes, it is a topic definitely needing expansion.
-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------

Received on Wednesday, 24 August 2005 11:58:26 UTC