W3C home > Mailing lists > Public > www-international@w3.org > July to September 2005

RE: New article for REVIEW: Upgrading from language-specific legacy encoding to Unicode encoding

From: Jony Rosenne <rosennej@qsm.co.il>
Date: Wed, 24 Aug 2005 15:02:11 +0200
To: <www-international@w3.org>
Message-ID: <000d01c5a8ac$0cdbf470$0100000a@QSM7>

Where the text is long enough, a separate documnet linked to from the main
document is in order.

For Hebrew, the situation is a little simpler: In the general case it is not
possible to convert visual to logical automatically.


> -----Original Message-----
> From: Tex Texin [mailto:tex@xencraft.com] 
> Sent: Wednesday, August 24, 2005 1:58 PM
> To: Frank Yung-Fong Tang
> Cc: Jony Rosenne; www-international@w3.org
> Subject: Re: New article for REVIEW: Upgrading from 
> language-specific legacy encoding to Unicode encoding
> I was going to make more or less the same comment, which is 
> that conversion
> from legacy encodings to unicode is a difficult but necessary subject.
> It is large so should be a separate faq or faqs, and should cover many
> encodings, not just bidi.
> Any minute now, Richard is going to pipe up suggesting Joni 
> submit a faq for
> hebrew and Frank one for double-byte encoding conversions, so 
> I'll preempt
> him and suggest that as well. ;-)
> Although we could use a treatise on these issues, I wonder if 
> it would be
> better to identify libraries or tools that do the job right 
> and give users
> appropriate choices. I muck around with iconv, ICU, perl, 
> etc. and it is
> very hard to know which tools will do the entire job 
> correctly, and which do
> the minimum, or are several versions behind.
> For example, a convertor written for Unicode 2.0 would not 
> take advantage of
> the characters in Unicode 4.x.
> It is correct in some sense and incorrect in other ways. Also, a pure
> encoding convertor would not take into account the needs of 
> the Web, and
> perhaps issues of conversion to the bidi markup.
> And which tools offer a choice when it comes to converting 
> backslash to yen,
> wan, etc. when used as currency?
> Many users are confused by which conversions to use. e.g. When to use
> Windows-1252 instead of iso 8859-1, or when to use big5-hkscs 
> instead of
> big-5, since often data is mislabeled?
> I think the tools view or roadmap may be more important than 
> the character
> encoding details.
> But yes, it is a topic definitely needing expansion.
> -- 
> -------------------------------------------------------------
> Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> Xen Master                          http://www.i18nGuy.com
> XenCraft		            http://www.XenCraft.com
> Making e-Business Work Around the World
> -------------------------------------------------------------
Received on Wednesday, 24 August 2005 12:03:28 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:25 UTC