W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

Re: The fate of Hebrew texts with Hyphen-Minus instead of Maqaf

From: Mark Davis <mark.davis@jtcsv.com>
Date: Tue, 16 Sep 2003 15:14:59 -0700
Message-ID: <00a501c37c9f$f7d2fdb0$7900a8c0@DAVIS1>
To: <bidi@prognathous.mail-central.com>, <www-international@w3.org>

The bidi algorithm was designed in full knowledge that it would not be able to
handle all ordering cases, because there is often not enough information in the
text to provide for the right ordering, or there are inconsistencies between
different usage patterns, or the rules to do so would be too complex. For that
reason, it supplies various mechanisms to override the normal ordering results.
Corresponding mechanisms have been developed for HTML and internally in word
processing modules.

Such overrides should be added to the text when being composed or edited. (Added
just before rendering is not recommended, since the text would appear different
than on systems that don't have this special override.

If the Maqaf is a necessary character for Hebrew, then you may wish to lobby
those organizations supplying Hebrew keyboards to get it added.

Mark
__________________________________
http://www.macchiato.com
►  “Eppur si muove” ◄

----- Original Message ----- 
From: <bidi@prognathous.mail-central.com>
To: "Mark Davis" <mark.davis@jtcsv.com>; <www-international@w3.org>
Sent: Tuesday, September 16, 2003 10:05
Subject: Re: The fate of Hebrew texts with Hyphen-Minus instead of Maqaf


On Tue, 16 Sep 2003 07:56:35 -0700, "Mark Davis" <mark.davis@jtcsv.com>
said:
> 1. The Unicode BIDI algorithm does specify the rendering order of the
>    sequence HebrewLetter+HyphenMinus+Number. If that rendering order is
>    not what is desired, then it also provides a way to override it.

The way the UBA specifies it, the rendering order of such sequences is
practically *always* not as desired.

As for overriding it, I can see how that may be a valid solution while
typing new texts, but how can it be done when rendering existing texts?
Would you recommend to plant control characters into existing texts
before rendering them?

> 2. Unicode BIDI algorithm is a rendering algorithm. It has nothing to
>    do with keyboards.

It is due to a keyboard limitation that most existing Hebrew texts are
not rendered by the UBA as intended by their authors. It's all quite
simple:

1. The Maqaf is not available in the Hebrew keyboard layout. Due to this
   limitation, users type Hyphen-Minus instead.
2. As a result, virtually all Hebrew texts include sequences of
   HebrewLetter+HyphenMinus+Number instead of HebrewLetter+Maqaf+Number.
3. These sequences are not rendered properly by applications that
   implement the UBA.

Here's a typical example:

The English phrase "The 20th century" is translated into Hebrew as:
  "המאה ה־20"

Due to the lack of Maqaf, most people have to use a Hyphen-Minus:
  "המאה ה-‏20"

In applications that implement the UBA, you will find the following:
  "-20‎‏המאה ה"

Not only is this order wrong (and an eyesore), but it also breaks the
meaning of the sequence. Instead of a positive number, you get a
negative one.

I hope this is clearer.

Prog.

> __________________________________
> http://www.macchiato.com ג–÷  ג€&#65533;Eppur si muoveג€&#65533; ג—„
>
> ----- Original Message ----- From: <bidi@prognathous.mail-
> central.com> To: <www-international@w3.org> Sent: Monday, September
> 15, 2003 12:45 Subject: RE: The fate of Hebrew texts with Hyphen-
> Minus instead of Maqaf
>
>
> >
> > I'd like to wrap this up.
> >
> > My understanding is that the Unicode BiDi Algorithm does not provide
> > a solution for rendering of *existing* Hebrew texts that include
> > sequences of HebrewLetter+HyphenMinus+Number, nor does it provide a
> > solution for entry of such sequences with current systems that do not
> > map the Hebrew Punctuation Maqaf to the keyboard.
> >
> > Any objections to the above conclusion?
> >
> > Prog.
> >
> > On Sun, 24 Aug 2003 19:18:04 +0200, bidi@prognathous.mail-
> > central.com said:
> > >
> > > On Sun, 24 Aug 2003 16:27:31 +0200, "Jony Rosenne"
> > > <rosennej@qsm.co.il> said:
> > > > For Hebrew, the Maqaf should be used.
> > >
> > > I fully agree that the Maqaf should be used. In fact, I actually
> > > created a customized Hebrew keymap that replaces the non-numpad Hyphen-
> > > Minus with the Maqaf, and this is what I use when writing Hebrew,
> > > but... there are massive amounts of *existing* texts that use Hyphen-
> > > Minus instead (virtually all of them). What will be their fate?
> > > "are they doomed forever to render wrongly under applications that
> > > use the Unicode BiDi algorithm?"
> > >
> > > > Handling the change and the conversion has not been seriously
> > > > tackled in any major environment.
> > >
> > > I'm working on it, but there are currently several obstacles that
> > > complicate this campaign:
> > > 1. Badly rendered Maqaf glyphs in most common fonts (it's usually
> > >    too high). http://exego.net/forums/showMessage.asp?i=9320&qs=
> > > 2. The Maqaf and some other punctuation marks are not included in
> > >    the Israeli Keyboard Layout Standard (SI-1452). This may
> > >    hopefully change, but it takes time to convince everyone on TC-
> > >    2109 that adding these marks would be a worthwhile move.
> > > 3. It may not be easy to educate users to accept and use the
> > >    correct Hebrew punctuation marks, instead of foreign ones.
> > > 4. Data integrity issues have to be taken into consideration (e.g.
> > >    searching Hebrew texts for Maqaf/Minus, Geresh/Apostrophe, and
> > >    Gershaim/Quotes)
> > >
> > > All of these points are important and once solved, would mean that
> > > the Maqaf could be a viable solution, but the fate of existing
> > > texts is just as important (and is the main subject of this
> > > thread).
> > >
> > > Any suggestions?
> > >
> > > Prog.
> > >
> > >
> > > >
> > > > Jony
> > > >
> > > > > -----Original Message----- From: www-international-
> > > > > request@w3.org [mailto:www-international-request@w3.org] On
> > > > > Behalf Of bidi@prognathous.mail-central.com Sent: Wednesday,
> > > > > August 20, 2003
> > > > > 12:23 AM To: www-international@w3.org Subject: The fate of
> > > > >    Hebrew texts with Hyphen-Minus instead of Maqaf
> > > > >
> > > > >
> > > > >
> > > > > For the sake of the argument, let's assume that Hebrew
> > > > > Punctuation Maqaf is now part of the official keyboard layout;
> > > > > that it is implemented well (both in fonts and keymap) in all
> > > > > major operating systems; and that users of Hebrew accept the
> > > > > new addition and start to use it from then on. What will be the
> > > > > fate of all Hebrew texts that used Hyphen-Minus instead? are
> > > > > they doomed forever to render wrongly under applications that
> > > > > use the Unicode BiDi algorithm? by wrong, I strictly refer to
> > > > > the way the original authors intended them to render.
> > > > >
> > > > > Further discussion about this problem can be found here:
> > > > > > http://bugzilla.mozilla.org/show_bug.cgi?id=73251#c32
> > > > >
> > > > > Prog.
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
>
Received on Tuesday, 16 September 2003 18:15:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT