W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > November 2010

[Bug 10808] text with unknown direction gets corrupted when inserted in content with opposite direction

From: <bugzilla@jessica.w3.org>
Date: Thu, 04 Nov 2010 00:54:58 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1PDo6U-0000YK-0B@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10808

--- Comment #32 from Aharon Lanin <aharon.lists.lanin@gmail.com> 2010-11-04 00:54:56 UTC ---
(In reply to comment #29)
> (In reply to comment #28)
> > Is there a spec anywhere I can reference to easily define how to determine
> > whether an element's logical direction is ltr or rtl?
> 
> It should be the unicode bidi algorithm, but I'm not sure where precisely
> that's defined.  Fantasai, Aharon?

It is defined in <http://www.unicode.org/reports/tr9/#The_Paragraph_Level>, but
we have proposed it only as the default estimation algorithm, not the only one
to be made available.

As discussed at least twice - in
<http://www.w3.org/International/docs/html-bidi-requirements/#auto-direction>
and the comments here - the direction estimation algorithm defined by the UBA
is not always optimal. It goes by the first character with strong direction.
RTL text quite often needs to start with an LTR word or phrase, e.g. "java IS A
PROGRAMMING LANGUAGE ORIGINALLY DEVELOPED BY ...", in which case the UBA's
estimation algorithm incorrectly judges it to be LTR. Mark Davis, the
co-founder of Unicode, and the inventor of the UBA, has stated on more than one
occasion that the estimation algorithm given by the UBA was not meant to be the
last word in estimation algorithms, but only a stopgap.

IMO, at least one algorithm gives better results in most - but not all! - use
cases. Here, the presence of *any* RTL characters in the first X characters of
the string qualify it as RTL.

Please refer to
<http://www.w3.org/International/docs/html-bidi-requirements/#auto-direction>
for details on the two algorithms.

Since there is no one algorithm that gives the best results in all significant
use cases, we have proposed giving the author the ability to choose between a
couple - without requiring the user to actually make that choice. Thus, the
proposal to support an autodirmethod=first-string|any-rtl|plaintext attribute.
(As I said above, though, let's ignore plaintext for now.)  Once again, see
<http://www.w3.org/International/docs/html-bidi-requirements/#auto-direction> -
or the original description of this bug report. If you think that the ability
to choose is a separate issue and should be filed as a separate bug, please let
me know.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 4 November 2010 00:55:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 4 November 2010 00:55:07 GMT