W3C home > Mailing lists > Public > www-i18n-comments@w3.org > January 2001

bidirectional text

From: Tim Moore <fctmoore@hkusua.hku.hk>
Date: Mon, 29 Jan 2001 16:31:22 +0800
Message-ID: <3A752A53.618BAD47@hkusua.hku.hk>
To: www-i18n-comments@w3.org
CC: fctmoore@hkusua.hku.hk
Dear Colleagues,

   I read with interest the Working Draft on Character Model.  I have
one misgiving to do with "bidirectional text".  Clearly,
bidirectionality occurs and must be appropriately handled (most
obviously when scripts with different directionality are mixed).
   My misgiving is more specific (and rather pedantic).  It concerns the
commonly made claim (repeated in Appendix A example A.6) that, in Arabic
script, letters are written right-to-left, while digits are written
left-to-right.  This is a dubious claim, since there is not a clear
basis to determine the "directionality" of digits in a decimal number.
   For instance, consider "62".  Conventionally, in European languages,
this is of course spoken from left to right (and the most significant
digit comes first).  But even in European languages, correspondances are
not always so straightforward, so that "72" in French becomes "sixty
twelve" [soixante douze] when spoken, and in English, "70" can be read
"three score and ten".  It is true that in Arabic the most significant
digit is also placed to the left, but this is a mere convention, so that
it does not follow that the number is "written from left-to-right".
Indeed an Arabic speaker saying the number will in fact utter the
equivalent of "two and sixty" (ithnein wa sittin).
   I take it that the directionality of text can be adequately defined,
but it seems to me that this does not apply straightforwardly to
numerical notations (consider what might be viewed as mixed
directionality, from a logical point of view, in a Roman numeral such as
MCMXLIV, where M maps to one thousand, CM maps to nine hundred [one
hundred short of a thousand], XL maps to forty [ten short of fifty] and
IV maps to 4 [one short of five].  Compare also the order of addresses:
in some conventions you start with the most particular, and end with the
most general, as in English; in Chinese, by contrast, you start with the
most general, and end with the name of the addressee.  Consider also the
case of Polish notation and reverse Polish notation for logical or
arithmetic operations, where the first starts, on the left, with the
least significant symbol and the second starts with the most
significant.  A final case: what is the "logical order" of the numbers
in a digitally represented date?  Is it day-month-year [European, the
first being the the most specific], year-month-day [Chinese, the first
being the most general], or month-day-year [USA, mixed]?)
   It appears to me that in the specific case the notion of the "logical
location" of a digit is not well-defined, and could be justified only by
just stipulating that any decimal notion must place the most significant
digit to the left of the string of digits etc., and that the most
significant digit must be treated as logically "first".  Nothing wrong
with that, I guess, provided that it is explicitly recognized as a mere
stipulation, and is not imposed by any general considerations of logic
or language and may or may not provide a nice fit with particular
languages (as written or spoken).
   (Maybe the real key is the Arabic input methods for numerals which
have been selected for computer keyboards.  The few I have seen do
indeed employ bidirectionality and require inputting, say, 231, in the
order 2 then 3 then 1, which may have seemed or been more convenient for
native speakers, but could not be done on a typewriter.)

Best wishes for the Year of the Snake,

Tim Moore
Emeritus Professor of Philosophy
The University of Hong Kong
Received on Monday, 29 January 2001 03:27:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 08:32:27 GMT