[html-bidi] Feedback on Additional Requirements for Bidi in HTML from Ehsan Akhgari on 2010-03-13 (public-i18n-bidi@w3.org from January to March 2010)

From: Ehsan Akhgari <ehsan@mozilla.com>
Date: Fri, 12 Mar 2010 19:55:21 -0500
To: public-i18n-bidi@w3.org
Message-ID: <dbd3aaef1003121655q78050af0x82b27aa5fa8c4892@mail.gmail.com>
Hi everyone,

Please first allow me to introduce myself.  I've been contributing to
the Mozilla project for 3.5 years, and I'm an employee of Mozilla
Corporation right now, working on Gecko.  I've also worked on
right-to-left UIs and localization issues at Mozilla, among other
things.

I've studied the Additional Requirements for Bidi in HTML draft, and I
would like to provide some feedback on it.  Hopefully it would be
useful.  I have categorized my feedback on a section by section basis.


* I think section 2.1 gives a sane solution to a very common problem
in real world.  I like the idea of not specifying the isolated bidi
attribute as a character a lot; I always thought that using the five
bidi control chars in documents which have some kind of a markup is a
mistake for the most part.  Not to mention that very few people
actually understand that there are such characters.

* We discussed how we can support section 2.2 with David Baron,
Johnathan Kew and fantasai during the work week.  Fantasai had a nice idea
of a heuristic algorithm considering the first N words in a text node
(let's say N=63) and trying to find if there is an RTL word among
them.  This is very similar to the second estimation algorithm
proposed in that document, but I believe that it's going to be much
more accurate than the other two for real-world usages.  Perhaps this
algorithm could be mentioned in this draft?

Also, I'm not a huge fan of specifying different algorithms as values
for the dir attribute.  I think relying on web authors to figure out
what algorithm to use can be very fragile, and it would be safe to
assume that if they understand the issue well enough to determine
which algorithm to use, they can probably come up with their own
implementation anyway.  I think in practice having a single attribute
value of dir=auto is much more useful, especially given the fact that
a large portion of web developers have very little understanding of
the issues existing with supporting bidi text.

* The proposal in Section 2.3 is probably useful too.  Although I
think the spec should also specify what happens if there is an actual
element with name="[input/textarea-name]_dir".  It may be as simple as
the latest such element overrides the values submitted for previous
elements, but it's still something which should be declared in the
spec, so that we don't end up with different browsers choosing to
implement it differently.

* Section 2.4 is really useful in real life, and also really easy to
implement.  My current thinking is something like below in html.css,
provided that we have an implementation of :ltr and :rtl in CSS, which
according to fantasai have also been discussed recently.

*:rtl > img[hflip=yes] {
 -moz-transform: scaleX(-1);
}

/* ditto for other permutations of :rtl/:ltr and hflip values. */

* Section 3.1 seems useful to have IMO, though I'm not sure the
original choice of <br> being treated as whitespace was a wise one.  I
tend towards actually changing that default behavior, but maybe I
don't know enough about the UBA to judge this.

* Same for section 3.2.  The desired behavior doesn't seem to be
specified in HTML4, but I think Gecko's choice of what to do is a poor
one, like I described above.

* Similarly for section 3.3, I think that the default bdi=yes.  But
like I said above, I'd need a better understanding of the UBA in order
to judge here.

* Section 3.4 is also a real-world problem, but I think the solution
proposed is really bad.  It's in fact as bad as the current practical
workaround which web authors would need to do (wrapping paragraphs
with bidi control chars); in fact it only changes when that workaround
is necessary (from when the displayed text is RTL to when the
displayed text is in the reverse direction of the document.)  Also,
what happens if the alert is being triggered from a LTR document which
is being included in an RTL document?  Such iframed documents might
not always have a clear mapping to a visible element as far as the end
user is concerned.

I think a much better solution would be to change the default behavior
to something similar to the dir=auto proposal (with a heuristic
similar to fantasai's suggestion), and provide a way in the DOM API to
override it (although the latter falls outside of the scope of this
document.)

* Section 3.5 is also a common problem, with a good solution IMO.

* Section 3.6 presents a bad solution IMO.  Like Section 3.4, I think
the default behavior should be similar to dir=auto with an optional
method for overriding it (like a titledir attribute, which would
default to "auto").  In fact I read this section several times, and it
seems paradoxical to me, because the proposed solution seems to fail
in the example given in the first paragraph.

For alt text, though, I think it's safe to take the element's
direction, because the element is not displaying any text itself.

* Section 3.7 seems good to me.

* The only thing that I would change about Section 3.8 is actually
recommending UAs to expose alternate ways of setting the direction
besides the keyboard shortcuts.  In practice, only a minority of users
know about the keyboard shortcuts, in my experience.

* Section 3.9 seems good to me.

* The solution proposed for Section 3.10 seems really strange to me.
I don't think I've ever seen software which produces this result, and
I don't remember seeing anything like this in books and other printed
materials.  What does TeX do here, Johnathan?

* I used to think that the solution in Section 3.11 is the wrong one,
but I've been convinced for quite a while that it is in fact the right
solution, since the scrollbar isn't a part of the page display.
However, I'm not still sure about what is the right thing to do for
elements other than the body element with scrollbars (Section 3.12).
Any of the two possible solutions seem wrong to me in some cases,
although I'm very slightly biased towards the proposed solution in
Section 3.12 here.  What do others think about this issue?

* There's a typo in the beginning of Section 3.12!  The background
should link to Section 3.11.


I'm interested to know what others think here!

Best,
--
Ehsan
<http://ehsanakhgari.org/>
Received on Monday, 15 March 2010 08:26:39 UTC