W3C home > Mailing lists > Public > public-i18n-bidi@w3.org > July to September 2010

Re: per-paragraph auto-direction, a.k.a. dir=uba

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Wed, 15 Sep 2010 10:40:19 +0200
Message-ID: <AANLkTinyO2gkoqxQSy7dyPify7m1GOJAOJscf2MFW3Uv@mail.gmail.com>
To: Matitiahu Allouche <matial@il.ibm.com>
Cc: "Phillips, Addison" <addison@lab126.com>, Adil Allawi <adil@diwan.com>, Behdad Esfahbod <behdad@behdad.org>, Ehsan Akhgari <ehsan@mozilla.com>, fantasai <fantasai.lists@inkedblade.net>, public-i18n-bidi@w3.org, public-i18n-bidi-request@w3.org, Shachar Shemesh <shachar@shemesh.biz>
> What do you mean, "dir does not" (inherit)?
> I interpret it as: an inner element without an
> explicit dir attribute does not inherit its
> direction from outer elements.
> So where does it get it from?
> I am sure it is not the right interpretation, but
> I am afraid the text itself is not very clear.

I am not trying to say anything new. I do not want to introduce any changes
to the dir inheritance mechanism (if one can call it that) from the way it
works in HTML 4.01, or from the way it is described in the first public
draft of the proposal, which is supposed to be compatible with HTML 4.01. As
far as I understand, the mechanism is that dir's effects are actually
limited to specifying default values for the CSS properties unicode-bidi and
direction:

   - Explicit dir=ltr defaults unicode-bidi to embed (or to bidi-override on
   a <bdo>) and direction to ltr. This, in turn, emulates an LRE (or LRO) and
   PDF.
   - Explicit dir=rtl defaults unicode-bidi to embed (or to bidi-override on
   a <bdo>) and direction to rtl. This, in turn, emulates an RLE (or RLO) and
   PDF.
   - Any other explicit value for dir (which, strictly speaking, is
   invalid), or no dir attribite at all, defaults unicode-bidi to normal and
   inherits direction from the parent. If the display is block, the element's
   content is put into a new UBA paragraph, with the paragraph level set
   according to the directipon property.

Thus, dir does not inherit. You can test this by checking that

<div dir=ltr>HELLO <span>-</span> GOODBYE</div>

does not give the same results as

<div dir=ltr>HELLO <span dir=ltr>-</span> GOODBYE</div>

- which would be the case if dir truly inherited.

> I may have missed it, but the description of dir=auto with
> autodirmethod=uba nowhere specifies that the direction is
> recomputed for each new paragraph (or element of such
> and such types).

I thought that this was implicit in stating that "the UBA is invoked on the
textarea content specifying only a default paragraph level (in icu4j
terminology, either LEVEL_DEFAULT_LTR or LEVEL_DEFAULT_RTL)". The UBA will
then assign each paragraph its own paragraph level. This could be stated
explicitly, of course.

> I think that [first-strong in the absence of strong characters
> but in the presence of] AN should rather return rtl. (In UBA
> jargon, EN digits in a neutral context receive bidi embedding
> level 0 while AN digits receive level 2). Native Arabic users
> should confirm.

First, I would like to state why I want first-strong to return ltr in the
absence of L, R, and AL, but in the presence of EN. It is so that a dir=auto
element containing what I call a "formatted number" comes out LTR in an RTL
context. Here are the EN "formatted number" cases that concern me:


   - phone numbers, e.g. "+1 617 987 6543", "(617) 987-6543", etc., which in
   RTL would come out visually as "6543 987 617 1+" and "987-6543 (617)",
   respectively.
   - negative numbers, e.g."-12", which in RTL comes out as "12-".


I believe that the EN phone number cases rendered in RTL would be
unacceptable in all RTL contexts - Hebrew, Arabic, Farsi, etc.

As for the negative numbers, rendering them in RTL is unacceptable in
Hebrew. In Arabic, as far as I understand, the minus is acceptable on either
side when EN digits are used, with the left perhaps being more commonly
used. Thus, making negative EN numbers LTR should do no harm there. I do not
know what the situation is in Farsi and Urdu - anyone?

Now let's look at the same cases with AN. And I must first admit that I have
only a vague idea of the way AN formatted numbers are supposed to look.

According to the UBA, the phone number examples above in AN digits would
look as follows in LTR and RTL (I am assuming your browser is actually
following the UBA here. Chrome and Firefox seem to be ok.):

   - LTR: "+١ ٦١٧ ٩٨٧ ٦٥٤٣" and "(٦١٧) ٩٨٧-٦٥٤٣", respectively.
   - RTL: "١ ٦١٧ ٩٨٧ ٦٥٤٣+" and "٦١٧) ٩٨٧-٦٥٤٣)", respectively.


I am pretty sure that the LTR variants are unacceptable. But - I think - so
are the RTL ones. Arabic speakers - is this true?

If so, I conclude that when using AN digits, phone numbers must not contain
spaces, dashes, and parentheses. (Periods, however, are ok.) Is this a
correct assessment? With this restriction, AN phone numbers seem to look
basically the same in LTR and RTL. In fact, if a leading plus is used, they
seem to look better in LTR:


   - LTR: "+١.٦١٧.٩٨٧.٦٥٤٣".
   - RTL: "١.٦١٧.٩٨٧.٦٥٤٣+".


Does anyone know if the plus logically appears before or after a phone
number spelled in AN digits? Or is something other than the + used for the
international prefix?

If the plus is used with AN phone numbers, and put before the number, then
LTR seems to be preferable.

Now, we get to AN negative numbers. From what I understand, when using AN
digits, the minus is supposed to appear on the right. The question is
whether logically, the minus is put before or after an AN number. Judging by
CLDR  and ICU, it goes *after* the number (
http://demo.icu-project.org/icu-bin/locexp?_=ar_MA&d_=en&currency=GBP&_r=JO).
Thus, for the minus to appear on the right, the number has to be rendered
LTR.

This was my reasoning for saying that AN formatted numbers should also be
displayed LTR. But, given that there is a lot of guesswork on my part above,
I could certainly be wrong. Arabic and Farsi speakers, your input is badly
needed.

BTW, if we do say that a string containing no strong characters, but
containing AN should be considered RTL, we have to figure out what to do
with a string containing no strong characters, but containing both AN and
EN.

Aharon

On Tue, Sep 14, 2010 at 11:23 PM, Matitiahu Allouche <matial@il.ibm.com>wrote:

> A few comments on Aharon's proposal.
>
> 1) Aharon wrote: "autodirmethod inherits; dir does not. The default would
> probably be first-strong."
> What do you mean, "dir does not" (inherit)?  I interpret it as: an inner
> element without an explicit dir attribute does not inherit its direction
> from outer elements. So where does it get it from?
> I am sure it is not the right interpretation, but I am afraid the text
> itself is not very clear.
>
> 2) Aharon wrote: "The first-strong algorithm returns the direction of the
> first strong (L, AL, or R) character it encounters. If it does not encounter
> any, it returns ltr if it encounters any weak ltr characters (EN or AN)"
> I think that AN should rather return rtl (in UBA jargon, EN digits in a
> neutral context receive bidi embedding level 0 while AN digits receive level
> 2). Native Arabic users should confirm.
>
> 3) I may have missed it, but the description of dir=auto with
> autodirmethod=uba nowhere specifies that the direction is recomputed for
> each new paragraph (or element of such and such types).
>
>
>
> Shalom (Regards),  Mati
>           Bidi Architect
>           Globalization Center Of Competency - Bidirectional Scripts
>           IBM Israel
>           Phone: +972 2 5888802    Fax: +972 2 5870333    Mobile: +972 52
> 2554160
>
>
>
>
> From:        "Aharon (Vladimir) Lanin" <aharon@google.com>
> To:        Matitiahu Allouche/Israel/IBM@IBMIL
> Cc:        fantasai <fantasai.lists@inkedblade.net>, "Phillips, Addison" <
> addison@lab126.com>, Adil Allawi <adil@diwan.com>, Behdad Esfahbod <
> behdad@behdad.org>, Ehsan Akhgari <ehsan@mozilla.com>,
> public-i18n-bidi@w3.org, public-i18n-bidi-request@w3.org, Shachar Shemesh
> <shachar@shemesh.biz>
> Date:        14/09/2010 17:39
> Subject:        Re: per-paragraph auto-direction, a.k.a. dir=uba
> ------------------------------
>
>
>
> I second Mati's proposal. As Ehsan has pointed out <textarea readonly> can
> be used to display plain text (as opposed to edit it), even though somewhat
> awkwardly. And if we work out a bigger proposal, we can tweak the bugs filed
> on HTML5 and CSS3. Right now it is imperative to produce a new draft of the
> proposal and file the bugs (October 1 deadline for HTML5..., apparently).
>
> I do want to put dir=uba under the dir=auto umbrella, via
> autodirmethod=first-strong|any-rtl|uba. autodirmethod inherits; dir does
> not. The default would probably be first-strong.
>
> Is this agreeable to everyone? Please respond.
>
> Here is a quick spec for dir=auto:
>
>    - Using dir=auto with autodirmethod=first-strong|any-rtl would:
>       - Make the default value for the ubi attribute ubi (i.e. on), as
>       described elsewhere.
>       - Set the CSS direction to ltr or rtl according to the indicated
>       algorithm.
>       - Invoke the indicated algorithm on the in-order traversal of the
>       descendent text nodes, with the following exceptions:
>          - Text nodes under a descendant element with an explicit dir
>          attribute (including dir=auto).
>          - The part of the text after the first X characters (where the
>          text in nodes excluded above are not part of the count). *Do we
>          need this? If so, what's a good X value? 100?*
>          - Parts of the text between an LRE, RLE, LRO, RLO, and its
>          matching PDF.
>       - The first-strong algorithm returns the direction of the first
>       strong (L, AL, or R) character it encounters. If it does not encounter any,
>       it returns ltr if it encounters any weak ltr characters (EN or AN). If it
>       does not encounter any of those either, it returns the inherited direction.
>       - The any-rtl algorithm returns rtl if it encounters any strong RTL
>       character, or ltr otherwise.
>    - Using dir=auto with autodirmethod=uba would (by default) set
>    unicode-bidi to "uba" and direction according to first-strong. (Note that
>    this includes leaving direction at the inherited value if the content is
>    neutral.)
>    - For elements other than <textarea>, unicode-bidi:uba is treated as
>    unicode-bidi:isolate.
>    - On <textarea>, unicode-bidi:uba means that:
>       - The UBA on the textarea content is invoked specifying only a
>       default paragraph level (in icu4j terminology, either LEVEL_DEFAULT_LTR or
>       LEVEL_DEFAULT_RTL), based on the the element's own direction value as
>       calculated above. (This makes the all-neutral paragraphs use the same
>       direction as the first paragraph that is not all-neutral.)
>       - Each UBA paragraph’s lines’ alignment is determined by the
>       paragraph’s resolved base level when the element's text-align is start or
>       end.
>
>
> Aharon
>
>
> On Tue, Sep 14, 2010 at 2:21 PM, Matitiahu Allouche <*matial@il.ibm.com*<matial@il.ibm.com>>
> wrote:
> I don't feel competent enough to find the magic solution for all the
> questions that dir=uba seems to raise.  This discussion has been going on
> for a while, and there is real danger that the whole item be shelved if a
> consensus is not found soon.
>
> However, based on the discussion on the list, I think that the following
> points are more or less agreeable to all:
> 1) dir=uba is mostly needed for <pre> and <textarea> elements.
> 2) All or most of the problems are related to using dir=uba with <pre>.
> 3) There are alternatives to using dir=uba with <pre> for multiple
> paragraphs, like separating the text in distinct paragraphs.
> 4) There is no problem related to using dir=uba with <textarea>.
> 5) There is no other way than dir=uba to achieve paragraph-based direction
> for <textarea>.
>
> Given the above, I am suggesting to at least allow dir=uba for <textarea>,
> even if its use for other types of elements is postponed or abandoned
> altogether.
>
>
> Shalom (Regards),  Mati
>           Bidi Architect
>           Globalization Center Of Competency - Bidirectional Scripts
>           IBM Israel
>           Phone: +972 2 5888802    Fax: +972 2 5870333    Mobile: +972 52
> 2554160
>
>
Received on Wednesday, 15 September 2010 08:41:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 15 September 2010 08:41:15 GMT