Re: "unicode-bidi" confusion

Braden N. McDaniel writes:
 > I'm having trouble understanding the example in the spec. Can someone
 > explain to me why
 >     HEBREW12 HEBREW13
 > is rendered
 >     13WERBEH 12WERBEH
 > given that this is in an ENGLISH PAR, and there is no markup around these
 > words to reverse the direction?

No mark-up is needed, because the direction is intrinsic to the
letters. I'm not an expert on this either, but let me try to explain
how I understand it.

The intention here is that the letters of "HEBREW" stand for letters
from a right-to-left script. (We decided not to put "real"
right-to-left letters in the spec, because they would probably not
look right in most people's browsers.)

Assume then that the word consists of letters that have a "strong
right-to-left" directionality. The Unicode standard assigns a
directionality to each letter; you can find it in the database of
characters. Apart from "strong-left" and "strong-right" there are 9
more categories. The digits are in the "weak-european-number"
category, the space is "neutral".

When a renderer encounters a pair of letters with strong right-to-left 
directionality, such as the "H" and "E" in the example, it will put
the second one to the left of the first one.

The digits, having weak directionality, will partially take on the
direction of their context. Thus the "1" will be to the left of the
"W", but the "2" will be to the right of the "1".

Spaces take on the directionality of their context. The weak numbers
are ignored for this and the space between the words will be a
right-to-left space.

The exact algorithm can be found in the Unicode specification, section

  Bert Bos                                ( W 3 C )                              W3C/INRIA                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

Received on Monday, 31 May 1999 06:00:24 UTC