- From: Addison Phillips [wM] <aphillips@webmethods.com>
- Date: Wed, 17 Sep 2003 23:26:02 -0700
- To: <bidi@prognathous.mail-central.com>, "Jony Rosenne" <rosennej@qsm.co.il>, "'Mark Davis'" <mark.davis@jtcsv.com>, <www-international@w3.org>
- Cc: "Shmuel Yair" <yshmuel@microsoft.com>
I think you are confusing two different things, input and output, with being a single algorithm. Unicode Bidi is concerned only with output--how to correctly display any given string of Unicode characters. It says absolutely nothing about how input is to be handled and has nothing whatever to do with keyboard mappings or input handling. What it does give us is a common, interchangeable basis on which to interpret any given character sequence for display directionality. As such, it does imply what the contents of a given string could or should be in order to achieve a given display effect. There are various characters such the various bidi controls that can be used to achieve these results. The fact that there exists a sequence of characters that, when using the Unicode Bidi algorithm, is both logical and displays correctly implies that the Unicode bidi algorithm is not what needs changing! It may be the case that the character sequence isn't optimal, but I think that is a quibble at best. Users generally do not care what the character sequence in memory is, only with the results on the display (the graphemes). On the input side, I infer that you expect a one-to-one key-to-character mapping. This isn't an accurate model, even for English keyboards (consider typesetters quotes, alt sequences, and so forth), nor for most Western European keyboards, let alone for Hebrew. For example, I commonly switch to the French keyboard to type common Western European diacriticals. For example, on that keyboard, Shift+{ (on my US QWERTY keypad) produces the "dead key" for umlaut (dieresis), which I then follow with the modified letter (let's say 'u' for now). This doesn't produce a String containing U+0308 U+0075 (the key sequence, which would, of course, be very wrong). Neither does it produce U+0075 U+0308 (which would be correct Unicode). It produces U+00FC. In other words: Microsoft could produce a Unicode string containing a (Unicode Bidi) correct sequence, given that they perform this interpretation internally. I think that was Jony's point. It isn't good that users must "work around" existing implementations. But the people to complain to are those that produce the implementations, in my opinion. What good are standards if people ignore them? Best Regards, Addison Addison P. Phillips Director, Globalization Architecture webMethods | Delivering Global Business Visibility 432 Lakeside Drive, Sunnyvale, CA, USA +1 408.962.5487 (office) +1 408.210.3569 (mobile) mailto:aphillips@webmethods.com Chair, W3C-I18N-WG, Web Services Task Force http://www.w3.org/International/ws Internationalization is an architecture. It is not a feature. > -----Original Message----- > From: www-international-request@w3.org > [mailto:www-international-request@w3.org]On Behalf Of > bidi@prognathous.mail-central.com > Sent: Wednesday, September 17, 2003 4:02 PM > To: Jony Rosenne; 'Mark Davis'; www-international@w3.org > Cc: Shmuel Yair > Subject: RE: The fate of Hebrew texts with Hyphen-Minus instead of Maqaf > > > > On Wed, 17 Sep 2003 20:44:21 +0200, "Jony Rosenne" > <rosennej@qsm.co.il> said: > > These existing texts are the result of a bug in Microsoft software. > > Microsoft had asked the UTC to change the classification of > > Hyphen-Minus according to their implementation, and the request was not > > accepted. > > Microsoft's implementation is preferred over the Unicode algorithm for > the following reasons: > 1. It keeps the sequence as a whole. "20ý-ä", rather than "20 -ä"ý. The > latter form is used by some users to circumvent the UBA mishandling of > such sequences. The extra space is against the rules of the Hebrew > language. These sequences are supposed to include a Maqaf, not a dash. > 2. It can be used satisfactorily with all Hebrew keyboard layouts, even > ones that don't map the Maqaf, e.g. both the Israeli standard keyboard > layout (SI-1452) and the one used in Windows. > 3. It works with the same logic that people use when writing, i.e. Hebrew > letter first, then Minus-Hyphen/Maqaf, and finally the number. Note > that with applications that implement the UBA, some people incorrectly > type the Minus-Hyphen after the number to keep the correct order. > No wonder they all consider *this* behavior to be a bug. > 4. It can be used with character sets that do not include the Maqaf (such > as ISO-8859-8). > 5. It is easy to use and straightforward. No need to type arcane and > hidden control characters. > > Bottom line: The way Microsoft handles these sequences is not a bug, it's > a feature. It works extremely well, and has no drawbacks. > > > > To the best of my knowledge, there are no cases in the Hebrew > > > language where a negative number is preceded by Hebrew letter without > > > another HyphenMinus/Maqaf in between ("-20ýä"). Since there's no > > > ambiguity here, it should be very much possible to revise the > > > algorithm so that it deals with such sequences. > > > > So there should be no problem for a text processor to get it right and > > produce the correct Unicode data stream. > > Are you suggesting that the rendering application will do some > pre-processing and insert control characters? or perhaps that it will > replace Hyphen-Minus with Maqaf marks? if so, wouldn't you consider such > pre-processing as part of actual BiDi algorithm? > > Moreover, I'm not sure that modifying the original texts without the > authors' consent is an acceptable solution. > > > > > or there are inconsistencies between different usage patterns, > > > > > > Which usage patterns exactly? I can't think of one that this revision > > > will break. > > > > I did not see any proposed revision, I only saw a description of the > > problem. > > You mentioned the proposed revision in the first paragraph. > > Prog.
Received on Thursday, 18 September 2003 02:27:31 UTC