Fwd: Re: [IME API]Some Insights from Chinese Input Methods from Xiaoqian Cindy Wu on 2013-12-26 (public-html-ig-zh@w3.org from December 2013)

From: Xiaoqian Cindy Wu <xiaoqian@w3.org>
Date: Thu, 26 Dec 2013 14:33:22 +0800
To: "public-html-ig-zh@w3.org >> W3C HTML5 中文興趣小組" <public-html-ig-zh@w3.org>
Message-ID: <52BBCDB2.2010803@w3.org>
-------- 原始消息 --------
主题:  Re: [IME API]Some Insights from Chinese Input Methods
重新发送日期:  Fri, 20 Dec 2013 03:55:39 +0000
重新发送发件人:  public-webapps@w3.org
日期:  Fri, 20 Dec 2013 12:54:30 +0900
发件人:  Takayoshi Kochi (河内 隆仁) <kochi@google.com>
收件人:  SongTao(桌面事业部) <SongTao@sogou-inc.com>
抄送:  public-webapps@w3.org <public-webapps@w3.org>



Hi,

On Thu, Dec 19, 2013 at 7:23 PM, Takayoshi Kochi (河内 隆仁) 
<kochi@google.com <mailto:kochi@google.com>> wrote:


    FYI, this is (still) being discussed at
    https://www.w3.org/Bugs/Public/show_bug.cgi?id=22018
    And your suggestion is almost same as the reason Microsoft people
    proposed this.

    One of the issue blocking this is a privacy issue, which may leak a
    personalized dictionary
    (user-defined dictionary or history).


For the suggestion use case for using getCompositionAlternatives(), I'm 
still not really convinced that
it could be useful. As Microsoft implemented 
<http://msdn.microsoft.com/en-us/library/ie/dn433251%28v=vs.85%29.aspx> 
it in IE11, I'd be interested if it improved quality of
Bing suggestion :)

One reason is that for suggestion use case is that it is questionable 
that giving alternate candidates
will yield better suggestions in general.  A simple case like yours 
(sogou -> 搜狗) may look working
very well, though.

As the area for the suggestion is usually very limited, a suggestion 
provider should give *really* probable
candidates for users.  So usually if composition has candidates, at best 
only top one or two could be *good*
for hinting suggestions, when they are sorted by relevancy.

I'd wish there were an API that exposes relevancy/confidence/whatever 
score for each candidate
for the composition, which an IME should have internally, then it could 
be useful for this use case.
E.g. you can imagine a case where there are 4 candidates, and their 
scores are all even (25, 25, 25, 25) or
the top has a very high score (96, 2, 1, 1).  It is less likely that the 
former yields better suggestion than the latter,
if a suggestion provider picks the top candidate.  If a suggestion 
provider can get such information,
it can even drop any suggestions for the all-even case, as it may not be 
sufficient input for generating
suggestions.

One solution for this which can be possible today is to have IME on the 
server side, and
browsers send 'raw' text (before conversion) to the server, then server 
can interact with its local
IME and retrieve such score information.  Then 
getCompositionAlternatives() is unnecessary.


The other reason is that it is questionable that how much such 
suggestions could save users'
typing (or time).  For the reason above, unless a suggestion provider 
can give really probable
suggestions, users won't get any benefit, or worse, users just get 
visual noise for distraction.

For saving typing, IMEs should provide candidates as early as possible, 
by predicting
user's typing.  For example, when you type 's' and IME could provide '搜 
狗' as a candidate, then
suggestion provider might be able to show suggestions that 搜狗 provides 
- it would be awesome,
if it were user's intention.  But 's' or 'so'  is usually too short to 
give a specific candidate.  I
tested with plain MS pinyin on windows8, after typing 'sog', "搜狗" was 
2nd after "送给".
If the user typed "搜狗" in the past and IME can suggest it with only 
's' - it is nice, but
it implies privacy concern if such history is exposed to the web.

For Japanese, typically Japanese IMEs have 2 modes during composition, 
one for
typing composition text (in romanized form to compose reading in 
'Hiragana') and
then the other for converting the reading into Kanji and Kana mixed 
text.  IMEs
can provide candidate suggestions during the first mode, but (although this
is implementation dependent) some IMEs cannot expose such suggestions to
applications on Windows (e.g. Google Japanese Input).  For suggestions 
use case,
such suggestions would be more valuable, but not available technically. 
  Once
a user starts conversion after having typed reading, then candidates 
become available,
but it may be too late for this use case.


It got a bit long but this is my current thought about the usefulness of 
getCompositionAlternatives()
for suggestion use case.  Overall,my gut feeling is that it works only 
on some cases,
but will not work well on most cases.

I'd appreciate you have counter examples :)

Thanks!
-- 
Takayoshi Kochi
Received on Thursday, 26 December 2013 06:33:28 UTC