- From: fantasai <fantasai.lists@inkedblade.net>
- Date: Tue, 14 Sep 2010 13:15:59 -0700
- To: Ehsan Akhgari <ehsan@mozilla.com>
- CC: "Aharon (Vladimir) Lanin" <aharon@google.com>, Matitiahu Allouche <matial@il.ibm.com>, "Phillips, Addison" <addison@lab126.com>, Adil Allawi <adil@diwan.com>, Behdad Esfahbod <behdad@behdad.org>, public-i18n-bidi@w3.org, public-i18n-bidi-request@w3.org, Shachar Shemesh <shachar@shemesh.biz>
On 09/14/2010 12:29 PM, Ehsan Akhgari wrote: > On Tue, Sep 14, 2010 at 2:56 PM, fantasai<fantasai.lists@inkedblade.net> wrote: >>> # The part of the text after the first X characters (where the text in >>> nodes excluded above are not part of the count). Do we need this? If >>> so, what's a good X value? 100? >> >> And I think that for any-rtl having an X value is both better for >> performance and more likely to give good results. If the first X >> characters are LTR, where X is longer than most LTR phrases commonly >> imported into RTL text, chances are any RTL characters after that >> are not indicating the paragraph's main direction. > > I agree, but it's not clear to me how we can pick the best value for > X, other than just guessing... Guessing works for me. Here are some options: 31 - 2^5, but probably too short for many names. 63 - 2^6 100 - 10^2 255 - Counter fits in one byte. This should probably be the upper bound, not so much because of the byte limit, but because I strongly suspect going higher will decrease the quality of results. I'm going to advocate 63, since I can't think of any common strings (other than long URLs) that would hit that limit. 100 seems okay, too. I can't really think of any common cases where a higher limit would be helpful. Perhaps other people have other input... ~fantasai
Received on Tuesday, 14 September 2010 20:16:44 UTC