Re: per-paragraph auto-direction, a.k.a. dir=uba

On 09/14/2010 12:29 PM, Ehsan Akhgari wrote:
> On Tue, Sep 14, 2010 at 2:56 PM, fantasai<fantasai.lists@inkedblade.net>  wrote:
>>> # The part of the text after the first X characters (where the text in
>>> nodes excluded above are not part of the count). Do we need this? If
>>> so, what's a good X value? 100?
>>
>> And I think that for any-rtl having an X value is both better for
>> performance and more likely to give good results. If the first X
>> characters are LTR, where X is longer than most LTR phrases commonly
>> imported into RTL text, chances are any RTL characters after that
>> are not indicating the paragraph's main direction.
>
> I agree, but it's not clear to me how we can pick the best value for
> X, other than just guessing...

Guessing works for me. Here are some options:

31 - 2^5, but probably too short for many names.
63 - 2^6
100 - 10^2
255 - Counter fits in one byte. This should probably be the upper bound,
       not so much because of the byte limit, but because I strongly suspect
       going higher will decrease the quality of results.

I'm going to advocate 63, since I can't think of any common strings
(other than long URLs) that would hit that limit. 100 seems okay, too.
I can't really think of any common cases where a higher limit would
be helpful. Perhaps other people have other input...

~fantasai

Received on Tuesday, 14 September 2010 20:16:44 UTC