W3C home > Mailing lists > Public > www-international@w3.org > July to September 2005

Re: New article for REVIEW: Upgrading from language-specific legacy encoding to Unicode encoding

From: Frank Yung-Fong Tang <franktang@gmail.com>
Date: Wed, 24 Aug 2005 11:38:36 -0400
Message-ID: <2e4dfd6905082408386a2fa657@mail.gmail.com>
To: Mark Davis <mark.davis@icu-project.org>
Cc: Jony Rosenne <rosennej@qsm.co.il>, smontagu@smontagu.org, www-international@w3.org, Markus Scherer <markus.scherer@us.ibm.com>
make sense. Thanks for your clearification.

2005/8/24, Mark Davis <mark.davis@icu-project.org>:
> 
> I think what Jony is referring to is that there are multiple ways to go
> from visual to logical. Each possibility can be consistent, in that
> 
> toVisual(toLogical(X)) = X
> 
> however, they may not each be expected, and some combinations may
> require insertion of LRM or RLM, and/or knowledge of the bidi
> environment (http://www.unicode.org/reports/tr9/#Higher-Level_Protocols)
> used in getting toVisual(). Some simple examples:
> 
> Visual: abBA
> could result from:
> Logical: abAB
> or
> Logical: ABab
> 
> Visual: BAab
> could result from:
> Logical: <RLM>abAB
> or
> Logical: <LRM>ABab
> 
> Mark
> 
> Frank Yung-Fong Tang wrote:
> 
> >
> >
> > 2005/8/24, Jony Rosenne <rosennej@qsm.co.il <mailto:rosennej@qsm.co.il
> >>:
> >
> >
> > Where the text is long enough, a separate documnet linked to from
> > the main
> > document is in order.
> >
> >
> > agree.
> >
> > For Hebrew, the situation is a little simpler: In the general case
> > it is not
> > possible to convert visual to logical automatically.
> >
> >
> > Hum??? How can it be...
> > Simon: did we do the visual hewbrew to logical hebrew conversion in
> > Gecko before we pipe the ISO-8859-8 info to the Mac ATSUI ? It surely
> > is a hard process but if that is not possible how can we deal with
> > visual form on an environment which only support logical input ? (Like
> > ATSUI or WorldScript II on MacOS)
> >
> > Jony
> >
> > > -----Original Message-----
> > > From: Tex Texin [mailto: tex@xencraft.com <mailto:tex@xencraft.com>]
> > > Sent: Wednesday, August 24, 2005 1:58 PM
> > > To: Frank Yung-Fong Tang
> > > Cc: Jony Rosenne; www-international@w3.org
> > <mailto:www-international@w3.org>
> > > Subject: Re: New article for REVIEW: Upgrading from
> > > language-specific legacy encoding to Unicode encoding
> > >
> > >
> > > I was going to make more or less the same comment, which is
> > > that conversion
> > > from legacy encodings to unicode is a difficult but necessary
> > subject.
> > > It is large so should be a separate faq or faqs, and should
> > cover many
> > > encodings, not just bidi.
> > >
> > > Any minute now, Richard is going to pipe up suggesting Joni
> > > submit a faq for
> > > hebrew and Frank one for double-byte encoding conversions, so
> > > I'll preempt
> > > him and suggest that as well. ;-)
> > >
> > > Although we could use a treatise on these issues, I wonder if
> > > it would be
> > > better to identify libraries or tools that do the job right
> > > and give users
> > > appropriate choices. I muck around with iconv, ICU, perl,
> > > etc. and it is
> > > very hard to know which tools will do the entire job
> > > correctly, and which do
> > > the minimum, or are several versions behind.
> > >
> > > For example, a convertor written for Unicode 2.0 would not
> > > take advantage of
> > > the characters in Unicode 4.x.
> > > It is correct in some sense and incorrect in other ways. Also, a
> > pure
> > > encoding convertor would not take into account the needs of
> > > the Web, and
> > > perhaps issues of conversion to the bidi markup.
> > >
> > > And which tools offer a choice when it comes to converting
> > > backslash to yen,
> > > wan, etc. when used as currency?
> > >
> > > Many users are confused by which conversions to use. e.g. When
> > to use
> > > Windows-1252 instead of iso 8859-1, or when to use big5-hkscs
> > > instead of
> > > big-5, since often data is mislabeled?
> > >
> > > I think the tools view or roadmap may be more important than
> > > the character
> > > encoding details.
> > >
> > > But yes, it is a topic definitely needing expansion.
> > > --
> > > -------------------------------------------------------------
> > > Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com
> > <mailto:Tex@XenCraft.com>
> > > Xen Master http://www.i18nGuy.com
> > >
> > > XenCraft http://www.XenCraft.com
> > > Making e-Business Work Around the World
> > > -------------------------------------------------------------
> > >
> > >
> > >
> >
> >
> >
> >
> >
> > --
> > Frank Yung-Fong Tang 譚永鋒
> > Šýšţém Årçĥîţéçţ
> >
> > Day: 703-265-6347 http://people.netscape.com/ftang
> > Skype: FrankYungFongTang Yahoo IM: FrankYungFongTan
> > AIM ID: ytang0648 MSN IM:
> > FrankYungFongTang@hotmail.com <mailto:FrankYungFongTang@hotmail.com>
> >
> 
> 
> 


-- 
Frank Yung-Fong Tang 譚永鋒
Šýšţém Årçĥîţéçţ

Day: 703-265-6347 http://people.netscape.com/ftang
Skype: FrankYungFongTang Yahoo IM: FrankYungFongTan
AIM ID: ytang0648 MSN IM: FrankYungFongTang@hotmail.com
Received on Wednesday, 24 August 2005 15:39:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:05 GMT