On 09/14/2010 12:29 PM, Ehsan Akhgari wrote: > On Tue, Sep 14, 2010 at 2:56 PM, fantasai<fantasai.lists@inkedblade.net> wrote: >>> # The part of the text after the first X characters (where the text in >>> nodes excluded above are not part of the count). Do we need this? If >>> so, what's a good X value? 100? >> >> And I think that for any-rtl having an X value is both better for >> performance and more likely to give good results. If the first X >> characters are LTR, where X is longer than most LTR phrases commonly >> imported into RTL text, chances are any RTL characters after that >> are not indicating the paragraph's main direction. > > I agree, but it's not clear to me how we can pick the best value for > X, other than just guessing... Guessing works for me. Here are some options: 31 - 2^5, but probably too short for many names. 63 - 2^6 100 - 10^2 255 - Counter fits in one byte. This should probably be the upper bound, not so much because of the byte limit, but because I strongly suspect going higher will decrease the quality of results. I'm going to advocate 63, since I can't think of any common strings (other than long URLs) that would hit that limit. 100 seems okay, too. I can't really think of any common cases where a higher limit would be helpful. Perhaps other people have other input... ~fantasaiReceived on Tuesday, 14 September 2010 20:16:44 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 14 September 2010 20:16:45 GMT