Re: [review feedback] Visual vs. logical ordering of text from Richard Ishida on 2013-04-12 (www-international@w3.org from April to June 2013)

From: Richard Ishida <ishida@w3.org>
Date: Fri, 12 Apr 2013 16:31:24 +0100
To: Tomer Mahlin <TOMERM@il.ibm.com>
CC: www-international@w3.org
Message-ID: <516828CC.4030709@w3.org>
Tomer, many thanks for taking the time to write such an informative 
review.  This information is out of scope for the defined purpose of the 
article as it stands ("What is the difference between visual and logical 
ordering of text, and which should I use?"), but I added a small section 
as follows, pointing to your email.

[
Working with legacy systems

In modern systems where backend storage includes legacy data (created at 
some point using green screens) represented in visual order (such as 
mainframes or iSeries computers) it is necessary to support 
bidirectional flow of data between the back end (visual ordering) and 
web front end (logical ordering).

Various factors may be involved in this process, besides the order of 
the characters themselves. This level of detail is beyond the scope of 
the question that frames this article, but you can find useful 
additional information in a helpful email from Tomer Mahlin, of the IBM 
Bidi Development Lab.
]

Cheers,
RI


On 05/03/2013 05:00, Tomer Mahlin wrote:
> These are consolidated comments from IBM Bidi Globalization Center of
> Competency on the document stored at:
> http://www.w3.org/International/tutorials/new-bidi-xhtml/qa-visual-vs-logical
>
>
> General observations
>
>    In modern systems in which backend storage including legacy data
> (created at some point using green screens) is represented by visual
> system (such as mainframe or iSeries) it is required to support
> bidirectional flow of data between back end (visual ordering) and web
> front end (logical ordering).
> Two things might happen when data is passed between those back and front
> ends:
>      a. Code page conversion
>      b. Bidi layout transformation
>
> The first one is required since bidi data is represented on different
> systems with different code pages (i.e. EBCDIC on visual back end
> systems and ASCII / Unicode on logical front end systems)
> The second should occur since visual and logical systems have different
> approaches for correlation between Bidi data storage and display.
>
> Following data integrity issues should be taken into account from code
> page conversion perspective:
> - Code page conversion for Arabic between Unicode and EBCDIC usually
> imposes a problem with Arabic and Data Integrity if not handled
> carefully, this is because we have some ligatures "like Lam Alef
> character" that is stored as one character in EBCDIC and two characters
> in Unicode.
> - In addition to that, the shaped form of Arabic EBCDIC data when
> converted to the isolated Unicode form might have data integrity problem
> also when being converted back to EBCDIC codepage if not handled
> properly, the same might happen also for Arabic-Indic digits which is
> stored in this format in EBCDIC codepage.
> Those issues are unique for Arabic language.
>
> Following data integrity issue should be taken into account from bidi
> layout transformation perspective:
> - Since UBA conversion between visual and logical ordering is in general
> irreversible for preserving consistency of data it is required not to
> translate it to logical ordering schema. This is required when such data
> is being edited in web front end. For proper handling of visual data in
> such cases UBA working on logical platforms should be disabled or
> overwritten. A technique for achieving this goal through UCC (such as
> LRO) is described in section "overriding the algorithm" in
> http://www.w3.org/International/tutorials/new-bidi-xhtml/Overview-inline.
> Modern Dojo based toolkits come with controls which leverage such
> technique to provide native experience for working / editing of visual
> data.
> This data integrity issue is common to both Arabic and Hebrew languages.
>
>
> Section Quick Answer
>  > "... Visual ordering of text was a common way of representing Hebrew
> in HTML on old user agents that didn't support the Unicode bidirectional
> algorithm. Very little persists today. Characters making up the text
> were stored in the source code in the same order you would see them
> displayed on screen when looking from left to right.
> (Visual ordering isn't really seen much for Arabic. Since the Arabic
> letters are all joined up there was a stronger motivation on the part of
> Arabic implementers to enable the logical ordering approach.)..."
> Visual ordering of text is a common way of representing Arabic / Hebrew
> on systems which don't support the UBA such as mainframe or iSeries.
> Those systems are still widely used today. On such systems, characters
> making up the text are stored in the source code in the same order you
> would see them displayed on screen when looking from left to right.
>
>  >"... You should always create HTML (and any other type of markup)
> using logical ordering, and never use visual. ..."
> Whenever possible you should strive to create HTML (and any other type
> of markup) using logical ordering
>
> Section Visual ordering and its shortcomings
>  >"...To make visual ordering work, in addition to writing the text
> backwards, "
> Not necessarily. While this is true for "green screens", autopush
> feature in green screen emulators allow you to type Bidi text in the
> natural order.
>
> Section Visual ordering and character encodings
> Here is the list of correlation between different most popular character
> encoding commonly used on visual platforms (such as iSeries) and
> corresponding bidi layout characteristics such as ordering schema (which
> can be visual or logical).
>
> CCSID: 420 (string type: 4, Code page: 420 description: EBCDIC (original
> CCSID for Arabic Data)
> CCSID: 425 (string type: 5, Code page: 425 description: EBCDIC with
> POSIX chars, like [] {} etc.)
> CCSID: 424 (string type: 4, Code page 424  description: EBCDIC (original
> CCSID for Hebrew data).
>
> If you agree to incorporate the list of CCSID details I can provide
> additional ones :-)))
>
> String Type identifies properties of Bidi layout which should be taken
> into account during bidi layout transformation
> string type 4 (Text Type = visual, numeric-shaping = pass-through,
> Orientation= LTR, Text Shaping = shaped, Symmetric Swapping = off)
> string type 5 (Text Type = implicit, numeric-shaping = Arabic,
> Orientation= LTR, Text Shaping = unshaped, Symmetric Swapping = on)
>
> If you agree to incorporate this list of string type I can provide
> additional data on string types 6-12 used on legacy systems :-)))
>
> Additional information on bidi layout properties is as follows:
> Orientation: In bidirectional languages, some characters, such as
> English letters, are considered to have a strong left-to-right
> orientation. Other characters, such as the Arabic characters, are
> considered strong right-to-left characters. And other characters, such
> as punctuation marks, spaces, and so on, do not have a strong direction
> associated with them. These are also contextual. In this situation, the
> global orientation is set according to the direction of the first
> significant (strong) character.
> Numeric Shaping: In Arabic, it is common to use Hindi numbers instead of
> Arabic numbers. "1" "2" etc. are the Arabic version of the numbers.
> Text Shaping: Specifies the shaping: that is, choosing (or composing)
> the correct shape of the input or output text.
> Note: This value is important, in particular for languages where the
> shapes of the characters, when presented, correspond to code points that
> may be different from the code points of the characters stored for
> processing. In languages such as Arabic or Farsi, the character can have
> up to four different shapes (see Shapes of the Arabic Characters). In
> these languages the character is most frequently (but not always) stored
> and processed using a code point related to a basic shape. Often the
> basic shape chosen is the isolated shape.
> An Arabic Script character often has initial form, middle form, final
> form, and isolated form
> Symmetrical Swapping: The Swapping descriptor specifies whether
> symmetric swapping is applied to the text. A list of symmetric swapping
> characters is given in the ISO/IEC 10646 standard. For example, the
> string "(1)" without might become ")1("
>
>
> Best Regards,
>
> *Tomer Mahlin*
> GCoC Bidi architect
> Bidi Development Lab
>  *Phone:*+972-2-6491784| *Mobile:*+972-54-3368122*E-mail:
> *_tomerm@il.ibm.com_ <mailto:tomerm@il.ibm.com>*
> *  
> IBM R&D Labs
> Malcha Technology Park
> Jerusalem 96951
>   Israel
>
>
>


-- 
Richard Ishida, W3C
http://rishida.net/
Received on Friday, 12 April 2013 15:31:54 UTC