W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

Re: The fate of Hebrew texts with Hyphen-Minus instead of Maqaf

From: <bidi@prognathous.mail-central.com>
Date: Wed, 17 Sep 2003 09:58:30 +0200
To: "Mark Davis" <mark.davis@jtcsv.com>, www-international@w3.org
Message-Id: <20030917075830.3E8D070873@smtp.us2.messagingengine.com>

On Tue, 16 Sep 2003 15:14:59 -0700, "Mark Davis" <mark.davis@jtcsv.com>
said:
> The bidi algorithm was designed in full knowledge that it would not be
> able to handle all ordering cases,

Right now the algorithm doesn't provide an acceptable solution for Hebrew
users, as it breaks the rendering of most existing texts.

> because there is often not enough information in the text to provide
> for the right ordering,

Real life implementations show that there is more than enough information
to define a strict set of rules on how to deal with
HebrewLetter+HyphenMinus+Number sequences, without facing any false
positives.

To the best of my knowledge, there are no cases in the Hebrew language
where a negative number is preceded by Hebrew letter without another
HyphenMinus/Maqaf in between ("-20"). Since there's no ambiguity here,
it should be very much possible to revise the algorithm so that it deals
with such sequences.

> or there are inconsistencies between different usage patterns,

Which usage patterns exactly? I can't think of one that this revision 
will break.

> or the rules to do so would be too complex.

These rules are already set and implemented by Microsoft and other
vendors such as Mellel for OS X.

Is it better to keep the UBA a little less complex, but inadequate for
the proper rendering of most existing texts?

> For that reason, it supplies various mechanisms to override the normal
> ordering results.

Which don't help one bit when rendering existing texts.

> Corresponding mechanisms have been developed for HTML and internally in
> word processing modules.

HTML requires knowledge that most users don't have. Moreover, it doesn't
help when dealing with plain text. As for word processing modules, what
set of rules should they follow? Why not add a single, standard set of
rules that deal with such cases to the UBA?

> Such overrides should be added to the text when being composed
> or edited.

As I said, this suggestion dosen't help for the rendering of
existing texts.

> (Added just before rendering is not recommended, since the text would
> appear different than on systems that don't have this special override.

So, what solution does the UBA has to offer for dealing with
HyphenMinus+Number sequences in existing texts?

> If the Maqaf is a necessary character for Hebrew, then you may wish to
> lobby those organizations supplying Hebrew keyboards to get it added.

"I'm working on it, but there are currently several obstacles that
complicate this campaign:
1. Badly rendered Maqaf glyphs in most common fonts (it's usually too
   high). http://exego.net/forums/showMessage.asp?i=9320&qs=
2. The Maqaf and some other punctuation marks are not included in the
   Israeli Keyboard Layout Standard (SI-1452). This may hopefully change,
   but it takes time to convince everyone on TC-2109 that adding these
   marks would be a worthwhile move.
3. It may not be easy to educate users to accept and use the correct
   Hebrew punctuation marks, instead of foreign ones.
4. Data integrity issues have to be taken into consideration (e.g.
   searching Hebrew texts for Maqaf/Minus, Geresh/Apostrophe, and
   Gershaim/Quotes)

All of these points are important and once solved, would mean that the
Maqaf could be a viable solution, but the fate of existing texts is just
as important (and is the main subject of this thread)."

Prog.
Received on Wednesday, 17 September 2003 03:58:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT