[Bug 10808] i18n comment 2 : new dir attribute value: auto, and a new attribute: autodirmethod from bugzilla@jessica.w3.org on 2010-10-19 (public-i18n-bidi@w3.org from October to December 2010)

From: <bugzilla@jessica.w3.org>
Date: Tue, 19 Oct 2010 14:34:45 +0000
To: public-i18n-bidi@w3.org
Message-Id: <E1P8DH3-0000xB-E8@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10808

--- Comment #16 from Aharon Lanin <aharon.lists.lanin@gmail.com> 2010-10-19 14:34:43 UTC ---
(In reply to comment #12)
> 1) Why would you ever want to not estimate the direction for each paragraph
> separately?

1. Estimating the direction of each UBA paragraph separately has a price.
2. The use cases are limited to <textarea> and <pre>.

Let's take a specific example:

<div dir=auto>
  some ltr text.
  <div>
    SOME RTL TEXT.
  </div>
  SOME MORE RTL TEXT.
</div>

There are three UBA paragraphs here: the text before the internal div, the text
inside it, and the text after it. What you want is to have the first displayed
in LTR, and the others in RTL, and are puzzled why dir=auto is defined to give
them all the same direction (for autodirmethod values other than plaintext).

First, note that if the first and third UBA paragraphs contained mark-up that
used the new CSS capabilities to depend on direction (e.g. text-align:start,
margin-end, :rtl in the selector, etc.), you would want it to depend on the UBA
paragraph's direction. However, the first and third UBA paragraphs are not
separate elements. They therefore must have the same CSS direction value. Thus,
having per-UBA-paragraph direction faces the unenviable choice of either
divorcing the direction-dependent CSS from the CSS direction to the
inaccessible UBA paragraph direction or having that CSS work inappropriately.
This choice is the price that I do not want to pay.

Now, the use cases. It is indeed possible to have multi-paragraph plain text
that can only be rendered well by assigning each of its UBA paragraphs its own
direction (as explicitly suggested by the UBA). However, such plain text is
limited to <textarea> and <pre> elements. <textarea> does not allow mark-up at
all, so the problem described above does not apply to it; <pre> is allowed to
contain some mark-up, but being pre-formatted, it is not expected to contain
the layout-modifying mark-up of the sort that bothers us. This is the use case
for autodirmethod=plaintext, which does per-paragraph estimation like you want,
but is not expected to handle well direction-dependent CSS within it.

On the other hand, I do not see a use case for the dir=auto in the example
above to automatically apply independently to the internal div. If the author
wants auto-estimation on the internal div, let him put dir=auto on the internal
div. For example, if you are embedding a piece of complicated HTML that you did
not author in your page, and you do not know the direction in which this piece
of HTML is supposed to be displayed, put a <div dir=auto> around that piece of
HTML. If inside it there are smaller pieces that have a different direction, it
was the job of the HTML's original author to indicate this within the HTML,
e.g.  with dir=auto elements around those smaller pieces.


> 2) Does it really make sense to expose the first-strong vs. any-rtl distinction
> to authors?  Why not just pick whichever one seems better for the platform?

The reason they exist is not to make it easier for the platform, but because
different approaches give better results for different kinds of content.
First-strong has a serious flaw: RTL text very often contains LTR words and
phrases (e.g. acronyms and brand names) and even fairly often starts with them,
e.g. "html IS A WONDERFUL PLATFORM". I therefore tend to prefer any-rtl for
most cases. However, in an input box, first-strong does have the advantage of
being easier for the user to surmise and control. Thus, I would say, if you
have content you are obtaining via an input box, use first-strong (both on the
input box and the elements that are then used to display those values). But if
you are  displaying text of unknown origin, any-rtl is a better bet.

> In
> particular, paragraphs are of unbounded length, and the browser might not have
> access to the full paragraph before it starts rendering (since it might have
> only received part of the page).
> 
> any-rtl would force browsers to scan the whole paragraph before rendering,
> which is bad. Or force them to flip directionality as the page is loading/as
> the user types, which is worse.

Which is why we are limiting any-rtl to scanning the first 100 characters of
the element's content. Flips are still possible, but unlikely. BTW, flips are
also still possible but unlikely for first-strong, since the element could
start with an arbitrary amount of neutral content.

> So first-strong is preferable.  Ideally we'd
> look beyond the first character, e.g., checking if the first 100 characters are
> at least 30% RTL, but that doesn't work well when the user is typing the
> content on the fly, since then direction will switch as he types.

Better estimation algorithms can and will be invented. The reason we are
currently only dealing with first-strong, any-rtl, and plaintext is that they
are well-known, tried, and easily defined and implemented. If and when a much
better algorithm is invented and proven, we want to be able to support it. That
does not mean that existing content that was created with and works for an
older estimation method should be potentially broken by applying the new
estimation algorithm to it without being asked to do so. This is exactly why we
have autodirmethod. We can extend the repertory of its values without making
them the default for existing content.

> I think that when this behavior is defined, we should evaluate where to
> activate it by default.  IMO, it would be a big win if this were enabled by
> default on all textareas and inputs, at least.  I wonder if it would really
> break anything much if it were the default on all elements.  Probably, but
> maybe worth trying . . .

I tend to agree, but not everyone does. A discussion worth having, although it
would have been better if it had already taken place in public-i18n-bidi before
the bugs were filed on HTML5.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You reported the bug.
Received on Tuesday, 19 October 2010 14:34:47 UTC