Re: [html-bidi] Feedback on Additional Requirements for Bidi in HTML from Aharon (Vladimir) Lanin on 2010-03-24 (public-i18n-bidi@w3.org from January to March 2010)

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Wed, 24 Mar 2010 11:47:04 +0200
To: Ehsan Akhgari <ehsan@mozilla.com>
Cc: public-i18n-bidi@w3.org
Message-ID: <6b45e1b51003240247u61ae15d0pea3ba16260b618e3@mail.gmail.com>
 > We discussed how we can support section 2.2 with David Baron,
> Johnathan Kew and fantasai during the work week.  Fantasai had a nice idea
> of a heuristic algorithm considering the first N words in a text node
> (let's say N=63) and trying to find if there is an RTL word among
> them.  This is very similar to the second estimation algorithm
> proposed in that document, but I believe that it's going to be much
> more accurate than the other two for real-world usages.  Perhaps this
> algorithm could be mentioned in this draft?

This is basically the any-rtl algorithm mentioned in the FPWD ("Does the
string contain any RTL characters?"). Real-world scenarios it fails on are:
- Informal LTR text with a little bit of RTL, e.g. "they had grapes and
cheese (YUM!)"
- Scholarly LTR text quoting words and phrases from RTL sources, e.g. "the
word usually translated as neighbor (FELLOW) is unusual."

Scenarios where it is more reliable than word-count are few and far-between.

I would prefer to continue discussing this in fantasai's thread.

> Also, I'm not a huge fan of specifying different algorithms as values
> for the dir attribute.  I think relying on web authors to figure out
> what algorithm to use can be very fragile, and it would be safe to
> assume that if they understand the issue well enough to determine
> which algorithm to use, they can probably come up with their own
> implementation anyway.  I think in practice having a single attribute
> value of dir=auto is much more useful, especially given the fact that
> a large portion of web developers have very little understanding of
> the issues existing with supporting bidi text.

I agree with you. The problem, as I have stated in a different thread, is
that I see no easy way to reach a consensus on which algorithm should be the
one offered.

One possibility that I have already suggested in a reply to fantasai's
message is to unify first-strong and word-count under the disguise of giving
the page author control over how much of the content is scanned (the
"auto[0-9]*" scheme, see in that thread). However, response has not been
overly enthusiastic.

Another possibility I have suggested in the the thread started by Tab Atkins
is to have just dir="auto", but to also introduce another attribute,
autodirtype (or perhaps autodirmethod). Its values would be either
"first-strong" or "word-count". It would have a default, and the value one
sets on an element would be inherited by its descendants. Its advantages are
that most users would not be faced with making an uninformed choice (they
would just use dir="auto" and get whatever the default is), and that
advanced users who don't like the default could set it just once on <body>
(without applying dir="auto" to <body> itself).

> The proposal in Section 2.3 is probably useful too.  Although I
> think the spec should also specify what happens if there is an actual
> element with name="[input/textarea-name]_dir".  It may be as simple as
> the latest such element overrides the values submitted for previous
> elements, but it's still something which should be declared in the
> spec, so that we don't end up with different browsers choosing to
> implement it differently.

You are right, it should be specified. BTW, what happens if you have two
inputs with the same name in the same form? It would be best if the two
cases were treated the same way.

> Section 3.1 seems useful to have IMO, though I'm not sure the
> original choice of <br> being treated as whitespace was a wise one.  I
> tend towards actually changing that default behavior, but maybe I
> don't know enough about the UBA to judge this.

This should be discussed in reply to Martin Duerst's thread. He is not a big
fan of the way people usually use <br> ("Fixing all those pages that think
<br> is a paragraph separator would be best!"), and is not even 100%
convinced he wants to make bdi="yes" <br>'s default, so I really don't think
he would go for treating <br> as a UBA paragraph separator. This issue has a
lot of history.

> Section 3.4 is also a real-world problem, but I think the solution
> proposed is really bad.  It's in fact as bad as the current practical
> workaround which web authors would need to do (wrapping paragraphs
> with bidi control chars); in fact it only changes when that workaround
> is necessary (from when the displayed text is RTL to when the
> displayed text is in the reverse direction of the document.)

I believe that the vast majority of dialog texts are in the language (and
thus the direction) of the document, so I think that this is still a
substantial improvement.

Nevertheless, you do have a point that things are still not hunky-dory.

> Also,
> what happens if the alert is being triggered from a LTR document which
> is being included in an RTL document?

According to the current proposal, its text will be assumed to be LTR, and I
think that's fine.

>  Such iframed documents might
> not always have a clear mapping to a visible element as far as the end
> user is concerned.

I don't understand. What relevance does any element have here?

> I think a much better solution would be to change the default behavior
> to something similar to the dir=auto proposal (with a heuristic
> similar to fantasai's suggestion),

Leaving the specific estimation algorithm out of it, I think your suggestion
does have merit. The counter-argument, of course, is that estimation is not
foolproof, and most dialogs should be in the language of the page. I am not
sure which way I like better. Let's see what others have to say.

> and provide a way in the DOM API to
> override it (although the latter falls outside of the scope of this
> document.)

Do you mean an optional dir parameter to Javascript's alert(), confirm(),
and prompt(), and recommend a similar change in all other script languages /
APIs? I agree it would be best, but as you said, it is outside the scope of
this proposal.

> Section 3.6 presents a bad solution IMO.  Like Section 3.4, I think
> the default behavior should be similar to dir=auto with an optional
> method for overriding it (like a titledir attribute, which would
> default to "auto").

Re using the text's estimated direction, as opposed to the element
direction, this would be a non-backwards-compatible change. I say this
because currently there *is* browser consensus on how to treat tooltip text.
(This is not the case for dialog text, which is why I am more open to the
same suggestion in 3.4.) My preference would be not to break backwards
compatibility.

Re titledir, I am not overly eager to add an attribute where it is not
absolutely necessary, especially for such an out-of-the-way corner as this.
And as the handling of the counterexample below shows, it is not absolutely
necessary. Also, would titledir apply to the alt value? Or do we need an
altdir too? It's messy.

> In fact I read this section several times, and it
> seems paradoxical to me, because the proposed solution seems to fail
> in the example given in the first paragraph.

Although, as the document states, the counterexample is not a common one, I
included it in order not to hide the problem, and to illustrate why I
believe the spec should say something explicit on the subject.

Under the proposal, the counterexample would be handled by putting the
tooltip on an extra element wrapping the original one, i.e.:

<span title="THE ADDRESS"><span dir="ltr">10 Downing Street</span></span>

Yes, I know that this is not perfect. But in a tradeoff with lack of
backward compatibility, I think it is the right thing to do.

> The only thing that I would change about Section 3.8 is actually
> recommending UAs to expose alternate ways of setting the direction
> besides the keyboard shortcuts.  In practice, only a minority of users
> know about the keyboard shortcuts, in my experience.

Basically, I agree. The only method besides keyboard shortcuts of which I am
aware, though, is via a right-click menu in Safari. It is not
extraordinarily discoverable either, but it is better than keyboard
shortcuts.

I guess it will be up to the HTML folks to decide whether the spec can make
such a recommendation (as is the case for the recommendation the proposal
already suggests - to support the method commonly used on a given OS, if
any.)

Aharon


On Sat, Mar 13, 2010 at 2:55 AM, Ehsan Akhgari <ehsan@mozilla.com> wrote:

> Hi everyone,
>
> Please first allow me to introduce myself.  I've been contributing to
> the Mozilla project for 3.5 years, and I'm an employee of Mozilla
> Corporation right now, working on Gecko.  I've also worked on
> right-to-left UIs and localization issues at Mozilla, among other
> things.
>
> I've studied the Additional Requirements for Bidi in HTML draft, and I
> would like to provide some feedback on it.  Hopefully it would be
> useful.  I have categorized my feedback on a section by section basis.
>
>
> * I think section 2.1 gives a sane solution to a very common problem
> in real world.  I like the idea of not specifying the isolated bidi
> attribute as a character a lot; I always thought that using the five
> bidi control chars in documents which have some kind of a markup is a
> mistake for the most part.  Not to mention that very few people
> actually understand that there are such characters.
>
> * We discussed how we can support section 2.2 with David Baron,
> Johnathan Kew and fantasai during the work week.  Fantasai had a nice idea
> of a heuristic algorithm considering the first N words in a text node
> (let's say N=63) and trying to find if there is an RTL word among
> them.  This is very similar to the second estimation algorithm
> proposed in that document, but I believe that it's going to be much
> more accurate than the other two for real-world usages.  Perhaps this
> algorithm could be mentioned in this draft?
>
> Also, I'm not a huge fan of specifying different algorithms as values
> for the dir attribute.  I think relying on web authors to figure out
> what algorithm to use can be very fragile, and it would be safe to
> assume that if they understand the issue well enough to determine
> which algorithm to use, they can probably come up with their own
> implementation anyway.  I think in practice having a single attribute
> value of dir=auto is much more useful, especially given the fact that
> a large portion of web developers have very little understanding of
> the issues existing with supporting bidi text.
>
> * The proposal in Section 2.3 is probably useful too.  Although I
> think the spec should also specify what happens if there is an actual
> element with name="[input/textarea-name]_dir".  It may be as simple as
> the latest such element overrides the values submitted for previous
> elements, but it's still something which should be declared in the
> spec, so that we don't end up with different browsers choosing to
> implement it differently.
>
> * Section 2.4 is really useful in real life, and also really easy to
> implement.  My current thinking is something like below in html.css,
> provided that we have an implementation of :ltr and :rtl in CSS, which
> according to fantasai have also been discussed recently.
>
> *:rtl > img[hflip=yes] {
>  -moz-transform: scaleX(-1);
> }
>
> /* ditto for other permutations of :rtl/:ltr and hflip values. */
>
> * Section 3.1 seems useful to have IMO, though I'm not sure the
> original choice of <br> being treated as whitespace was a wise one.  I
> tend towards actually changing that default behavior, but maybe I
> don't know enough about the UBA to judge this.
>
> * Same for section 3.2.  The desired behavior doesn't seem to be
> specified in HTML4, but I think Gecko's choice of what to do is a poor
> one, like I described above.
>
> * Similarly for section 3.3, I think that the default bdi=yes.  But
> like I said above, I'd need a better understanding of the UBA in order
> to judge here.
>
> * Section 3.4 is also a real-world problem, but I think the solution
> proposed is really bad.  It's in fact as bad as the current practical
> workaround which web authors would need to do (wrapping paragraphs
> with bidi control chars); in fact it only changes when that workaround
> is necessary (from when the displayed text is RTL to when the
> displayed text is in the reverse direction of the document.)  Also,
> what happens if the alert is being triggered from a LTR document which
> is being included in an RTL document?  Such iframed documents might
> not always have a clear mapping to a visible element as far as the end
> user is concerned.
>
> I think a much better solution would be to change the default behavior
> to something similar to the dir=auto proposal (with a heuristic
> similar to fantasai's suggestion), and provide a way in the DOM API to
> override it (although the latter falls outside of the scope of this
> document.)
>
> * Section 3.5 is also a common problem, with a good solution IMO.
>
> * Section 3.6 presents a bad solution IMO.  Like Section 3.4, I think
> the default behavior should be similar to dir=auto with an optional
> method for overriding it (like a titledir attribute, which would
> default to "auto").  In fact I read this section several times, and it
> seems paradoxical to me, because the proposed solution seems to fail
> in the example given in the first paragraph.
>
> For alt text, though, I think it's safe to take the element's
> direction, because the element is not displaying any text itself.
>
> * Section 3.7 seems good to me.
>
> * The only thing that I would change about Section 3.8 is actually
> recommending UAs to expose alternate ways of setting the direction
> besides the keyboard shortcuts.  In practice, only a minority of users
> know about the keyboard shortcuts, in my experience.
>
> * Section 3.9 seems good to me.
>
> * The solution proposed for Section 3.10 seems really strange to me.
> I don't think I've ever seen software which produces this result, and
> I don't remember seeing anything like this in books and other printed
> materials.  What does TeX do here, Johnathan?
>
> * I used to think that the solution in Section 3.11 is the wrong one,
> but I've been convinced for quite a while that it is in fact the right
> solution, since the scrollbar isn't a part of the page display.
> However, I'm not still sure about what is the right thing to do for
> elements other than the body element with scrollbars (Section 3.12).
> Any of the two possible solutions seem wrong to me in some cases,
> although I'm very slightly biased towards the proposed solution in
> Section 3.12 here.  What do others think about this issue?
>
> * There's a typo in the beginning of Section 3.12!  The background
> should link to Section 3.11.
>
>
> I'm interested to know what others think here!
>
> Best,
> --
> Ehsan
> <http://ehsanakhgari.org/>
>
>
>
Received on Wednesday, 24 March 2010 09:47:57 UTC