Re: per-paragraph auto-direction, a.k.a. dir=uba from Aharon (Vladimir) Lanin on 2010-09-27 (public-i18n-bidi@w3.org from July to September 2010)

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Mon, 27 Sep 2010 12:20:55 +0200
To: Amit Aronovitch <aronovitch@gmail.com>
Cc: CE Whitehead <cewcathar@hotmail.com>, public-i18n-bidi@w3.org
Message-ID: <AANLkTinjjiafOz79w4Z+YPJE1mZzmy0xRwG4cQ51PV7M@mail.gmail.com>

> (a) Just wish to mention that the UAX#9 default algorithm
> does ignore neutrals ("*P2. In each paragraph, find the*
*> first character of type L, AL, or R.")* - we might want to
> stick to that default if consensus cannot be reached (i.e.
> ignore all weaks and neutrals).

I strongly object. The behavior we want for EN numbers is clear: we want
LTR. Phone numbers and negative numbers that use EN digits get messed up in
RTL.

What we've been discussing is what to do about AN numbers.

> (b) I probably misunderstand something: returning rtl for
> an element containing only AN numbers is not the same
> as ignoring AN (which would give the inherited direction,
> possibly ltr).

I was responding to a suggestion made by CE Whitehead:

>> If [the first-strong estimatoiun algorithm] does not
>> encounter any [L, AL, or R characters], it returns ltr if
>> it encounters any weak ltr characters (EN or AN).

> I think it should return ltr except for AN, for which I would prefer-- I
guess -- that it return rtl unless there is an inherited dir of ltr

If you work it out, you'll see that in that suggestion, the presence of AN
characters simply does not make any difference. If the inherited dir is ltr,
it stays ltr despite an AN, and if the inherited dir is rtl, it will stay
rtl even if there are no AN characters.

This actually made sense to me, since whatever we do for AN characters does
not seem to help much, except perhaps for negative numbers.

Aharon
On Mon, Sep 27, 2010 at 11:52 AM, Amit Aronovitch <aronovitch@gmail.com>wrote:

> (a) Just wish to mention that the UAX#9 default algorithm does ignore
> neutrals ("*P2. In each paragraph, find the first character of type L, AL,
> or R.")* - we might want to stick to that default if consensus cannot be
> reached (i.e. ignore all weaks and neutrals).
>
> (b) I probably misunderstand something:
>  returning rtl for an element containing only AN numbers is not the same as
> ignoring AN (which would give the inherited direction, possibly ltr).
>
> (c)  It took a while till I figured out the significance of *negative* AN
> numbers, so I repeat here in case other people missed that too:
>       if the element contains "-123" (where figures here represent
> Arabic-Indic numerals), and the direction is RTL, it is displayed as "123-".
> However in LTR paragraph direction it is displayed as "-123". This might be
> the reason why the exception regarding weak ltr charcters was added (I had
> hard time figuring out why it was added before).
>
>     Amit
>
>
> On Mon, Sep 27, 2010 at 11:01 AM, Aharon (Vladimir) Lanin <
> aharon@google.com> wrote:
>
>> That is actually the same as just ignoring AN characters in first-strong.
>> I am fine with doing that unless there is a strong response from an Arabic
>> or Farsi speaker saying that negative AN numbers definitely do need to be
>> displayed in LTR, since (as we have seen) AN phone numbers come out the same
>> in LTR and RTL. Make it soon, since I need to get this finalized today.
>>
>> Aharon
>>
>>
>> On Sun, Sep 26, 2010 at 10:55 PM, CE Whitehead <cewcathar@hotmail.com>wrote:
>>
>>>
>>>
>>> From: Aharon (Vladimir) Lanin <aharon@google.com>
>>> Date: Sun, 26 Sep 2010 02:01:23 +0200
>>>
>>> >      - The first-strong algorithm returns the direction of the first
>>> > strong
>>> >      (L, AL, or R) character it encounters. If it does not encounter
>>> any,> it
>>> >      returns ltr if it encounters any weak ltr characters (EN or AN).
>>> I disagree slightly here; I think it should return ltr except for an for
>>> which I would prefer-- I guess -- that it return rtl unless there is an
>>> inherited dir of ltr
>>> Best,
>>> --C. E. Whitehead
>>
>>
>>
>

Received on Monday, 27 September 2010 10:21:48 UTC