W3C home > Mailing lists > Public > www-international@w3.org > July to September 2005

Re: New article for REVIEW: Upgrading from language-specific legacy encoding to Unicode encoding

From: Mark Davis <mark.davis@icu-project.org>
Date: Wed, 24 Aug 2005 08:05:26 -0700
Message-ID: <430C8CB6.70205@icu-project.org>
To: Frank Yung-Fong Tang <franktang@gmail.com>
CC: Jony Rosenne <rosennej@qsm.co.il>, smontagu@smontagu.org, www-international@w3.org, Markus Scherer <markus.scherer@us.ibm.com>

I think what Jony is referring to is that there are multiple ways to go 
from visual to logical. Each possibility can be consistent, in that

    toVisual(toLogical(X)) = X

however, they may not each be expected, and some combinations may 
require insertion of LRM or RLM, and/or knowledge of the bidi 
environment (http://www.unicode.org/reports/tr9/#Higher-Level_Protocols) 
used in getting toVisual(). Some simple examples:

Visual: abBA
could result from:
Logical: abAB
or
Logical: ABab

Visual: BAab
could result from:
Logical: <RLM>abAB
or
Logical: <LRM>ABab

Mark

Frank Yung-Fong Tang wrote:

>
>
> 2005/8/24, Jony Rosenne <rosennej@qsm.co.il <mailto:rosennej@qsm.co.il>>:
>
>
>     Where the text is long enough, a separate documnet linked to from
>     the main
>     document is in order.
>
>
> agree.
>
>     For Hebrew, the situation is a little simpler: In the general case
>     it is not
>     possible to convert visual to logical automatically. 
>
>
> Hum??? How can it be... 
> Simon: did we do the visual hewbrew to logical hebrew conversion in 
> Gecko before we pipe the ISO-8859-8 info to the Mac ATSUI ? It surely 
> is a hard process but if that is not possible how can we deal with 
> visual form on an environment which only support logical input ? (Like 
> ATSUI or WorldScript II on MacOS)
>
>     Jony
>
>     > -----Original Message-----
>     > From: Tex Texin [mailto: tex@xencraft.com <mailto:tex@xencraft.com>]
>     > Sent: Wednesday, August 24, 2005 1:58 PM
>     > To: Frank Yung-Fong Tang
>     > Cc: Jony Rosenne; www-international@w3.org
>     <mailto:www-international@w3.org>
>     > Subject: Re: New article for REVIEW: Upgrading from
>     > language-specific legacy encoding to Unicode encoding
>     >
>     >
>     > I was going to make more or less the same comment, which is
>     > that conversion
>     > from legacy encodings to unicode is a difficult but necessary
>     subject.
>     > It is large so should be a separate faq or faqs, and should
>     cover many
>     > encodings, not just bidi.
>     >
>     > Any minute now, Richard is going to pipe up suggesting Joni
>     > submit a faq for
>     > hebrew and Frank one for double-byte encoding conversions, so
>     > I'll preempt
>     > him and suggest that as well. ;-)
>     >
>     > Although we could use a treatise on these issues, I wonder if
>     > it would be
>     > better to identify libraries or tools that do the job right
>     > and give users
>     > appropriate choices. I muck around with iconv, ICU, perl,
>     > etc. and it is
>     > very hard to know which tools will do the entire job
>     > correctly, and which do
>     > the minimum, or are several versions behind.
>     >
>     > For example, a convertor written for Unicode 2.0 would not
>     > take advantage of
>     > the characters in Unicode 4.x.
>     > It is correct in some sense and incorrect in other ways. Also, a
>     pure
>     > encoding convertor would not take into account the needs of
>     > the Web, and
>     > perhaps issues of conversion to the bidi markup.
>     >
>     > And which tools offer a choice when it comes to converting
>     > backslash to yen,
>     > wan, etc. when used as currency?
>     >
>     > Many users are confused by which conversions to use. e.g. When
>     to use
>     > Windows-1252 instead of iso 8859-1, or when to use big5-hkscs
>     > instead of
>     > big-5, since often data is mislabeled?
>     >
>     > I think the tools view or roadmap may be more important than
>     > the character
>     > encoding details.
>     >
>     > But yes, it is a topic definitely needing expansion.
>     > --
>     > -------------------------------------------------------------
>     > Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
>     <mailto:Tex@XenCraft.com>
>     > Xen Master                          http://www.i18nGuy.com
>     >
>     > XenCraft                          http://www.XenCraft.com
>     > Making e-Business Work Around the World
>     > -------------------------------------------------------------
>     >
>     >
>     >
>
>
>
>
>
> -- 
> Frank Yung-Fong Tang   譚永鋒
> Šýšţém Årçĥîţéçţ
>
> Day: 703-265-6347                         http://people.netscape.com/ftang
> Skype: FrankYungFongTang           Yahoo IM: FrankYungFongTan
> AIM ID: ytang0648                         MSN IM: 
> FrankYungFongTang@hotmail.com <mailto:FrankYungFongTang@hotmail.com>
>                          
Received on Wednesday, 24 August 2005 15:05:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:05 GMT