Re: per-paragraph auto-direction, a.k.a. dir=uba from fantasai on 2010-09-14 (public-i18n-bidi@w3.org from July to September 2010)

From: fantasai <fantasai.lists@inkedblade.net>
Date: Tue, 14 Sep 2010 13:15:59 -0700
To: Ehsan Akhgari <ehsan@mozilla.com>
CC: "Aharon (Vladimir) Lanin" <aharon@google.com>, Matitiahu Allouche <matial@il.ibm.com>, "Phillips, Addison" <addison@lab126.com>, Adil Allawi <adil@diwan.com>, Behdad Esfahbod <behdad@behdad.org>, public-i18n-bidi@w3.org, public-i18n-bidi-request@w3.org, Shachar Shemesh <shachar@shemesh.biz>
Message-ID: <4C8FD7FF.3020002@inkedblade.net>

On 09/14/2010 12:29 PM, Ehsan Akhgari wrote:
> On Tue, Sep 14, 2010 at 2:56 PM, fantasai<fantasai.lists@inkedblade.net>  wrote:
>>> # The part of the text after the first X characters (where the text in
>>> nodes excluded above are not part of the count). Do we need this? If
>>> so, what's a good X value? 100?
>>
>> And I think that for any-rtl having an X value is both better for
>> performance and more likely to give good results. If the first X
>> characters are LTR, where X is longer than most LTR phrases commonly
>> imported into RTL text, chances are any RTL characters after that
>> are not indicating the paragraph's main direction.
>
> I agree, but it's not clear to me how we can pick the best value for
> X, other than just guessing...

Guessing works for me. Here are some options:

31 - 2^5, but probably too short for many names.
63 - 2^6
100 - 10^2
255 - Counter fits in one byte. This should probably be the upper bound,
       not so much because of the byte limit, but because I strongly suspect
       going higher will decrease the quality of results.

I'm going to advocate 63, since I can't think of any common strings
(other than long URLs) that would hit that limit. 100 seems okay, too.
I can't really think of any common cases where a higher limit would
be helpful. Perhaps other people have other input...

~fantasai

Received on Tuesday, 14 September 2010 20:16:44 UTC