W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

RE: The fate of Hebrew texts with Hyphen-Minus instead of Maqaf

From: <bidi@prognathous.mail-central.com>
Date: Thu, 18 Sep 2003 01:02:12 +0200
To: "Jony Rosenne" <rosennej@qsm.co.il>, "'Mark Davis'" <mark.davis@jtcsv.com>, www-international@w3.org
Cc: "Shmuel Yair" <yshmuel@microsoft.com>
Message-Id: <20030917230212.761FE9326F@smtp.us.messagingengine.com>

On Wed, 17 Sep 2003 20:44:21 +0200, "Jony Rosenne"
<rosennej@qsm.co.il> said:
> These existing texts are the result of a bug in Microsoft software.
> Microsoft had asked the UTC to change the classification of
> Hyphen-Minus according to their implementation, and the request was not
> accepted.

Microsoft's implementation is preferred over the Unicode algorithm for
the following reasons:
1. It keeps the sequence as a whole. "20-", rather than "20 -". The
   latter form is used by some users to circumvent the UBA mishandling of
   such sequences. The extra space is against the rules of the Hebrew
   language. These sequences are supposed to include a Maqaf, not a dash.
2. It can be used satisfactorily with all Hebrew keyboard layouts, even
   ones that don't map the Maqaf, e.g. both the Israeli standard keyboard
   layout (SI-1452) and the one used in Windows.
3. It works with the same logic that people use when writing, i.e. Hebrew
   letter first, then Minus-Hyphen/Maqaf, and finally the number. Note
   that with applications that implement the UBA, some people incorrectly 
   type the Minus-Hyphen after the number to keep the correct order. 
   No wonder they all consider *this* behavior to be a bug.
4. It can be used with character sets that do not include the Maqaf (such
   as ISO-8859-8).
5. It is easy to use and straightforward. No need to type arcane and
   hidden control characters.

Bottom line: The way Microsoft handles these sequences is not a bug, it's
a feature. It works extremely well, and has no drawbacks.

> > To the best of my knowledge, there are no cases in the Hebrew
> > language where a negative number is preceded by Hebrew letter without
> > another HyphenMinus/Maqaf in between ("-20"). Since there's no
> > ambiguity here, it should be very much possible to revise the
> > algorithm so that it deals with such sequences.
>
> So there should be no problem for a text processor to get it right and
> produce the correct Unicode data stream.

Are you suggesting that the rendering application will do some
pre-processing and insert control characters? or perhaps that it will
replace Hyphen-Minus with Maqaf marks? if so, wouldn't you consider such
pre-processing as part of actual BiDi algorithm?

Moreover, I'm not sure that modifying the original texts without the
authors' consent is an acceptable solution.

> > > or there are inconsistencies between different usage patterns,
> >
> > Which usage patterns exactly? I can't think of one that this revision
> > will break.
>
> I did not see any proposed revision, I only saw a description of the
> problem.

You mentioned the proposed revision in the first paragraph.

Prog.
Received on Wednesday, 17 September 2003 19:03:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT