Re: [IME API]Some Insights from Chinese Input Methods from 河内隆仁 on 2013-12-20 (public-webapps@w3.org from October to December 2013)

From: 河内隆仁 <kochi@google.com>
Date: Fri, 20 Dec 2013 12:54:30 +0900
To: SongTao(桌面事业部) <SongTao@sogou-inc.com>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <CADP2=hoChAhbSHJ4ad69aj46GJy8Z_J4hPTesH1fZHLychAJ9A@mail.gmail.com>
Hi,

On Thu, Dec 19, 2013 at 7:23 PM, Takayoshi Kochi (河内 隆仁)
<kochi@google.com>wrote:

>
>> FYI, this is (still) being discussed at
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=22018
> And your suggestion is almost same as the reason Microsoft people proposed
> this.
>
> One of the issue blocking this is a privacy issue, which may leak a
> personalized dictionary
> (user-defined dictionary or history).
>

For the suggestion use case for using getCompositionAlternatives(), I'm
still not really convinced that
it could be useful.  As Microsoft
implemented<http://msdn.microsoft.com/en-us/library/ie/dn433251(v=vs.85).aspx>it
in IE11, I'd be interested if it improved quality of
Bing suggestion :)

One reason is that for suggestion use case is that it is questionable that
giving alternate candidates
will yield better suggestions in general.  A simple case like yours (sogou
-> 搜狗) may look working
very well, though.

As the area for the suggestion is usually very limited, a suggestion
provider should give *really* probable
candidates for users.  So usually if composition has candidates, at best
only top one or two could be *good*
for hinting suggestions, when they are sorted by relevancy.

I'd wish there were an API that exposes relevancy/confidence/whatever score
for each candidate
for the composition, which an IME should have internally, then it could be
useful for this use case.
E.g. you can imagine a case where there are 4 candidates, and their scores
are all even (25, 25, 25, 25) or
the top has a very high score (96, 2, 1, 1).  It is less likely that the
former yields better suggestion than the latter,
if a suggestion provider picks the top candidate.  If a suggestion provider
can get such information,
it can even drop any suggestions for the all-even case, as it may not be
sufficient input for generating
suggestions.

One solution for this which can be possible today is to have IME on the
server side, and
browsers send 'raw' text (before conversion) to the server, then server can
interact with its local
IME and retrieve such score information.  Then getCompositionAlternatives()
is unnecessary.


The other reason is that it is questionable that how much such suggestions
could save users'
typing (or time).  For the reason above, unless a suggestion provider can
give really probable
suggestions, users won't get any benefit, or worse, users just get visual
noise for distraction.

For saving typing, IMEs should provide candidates as early as possible, by
predicting
user's typing.  For example, when you type 's' and IME could provide '搜狗'
as a candidate, then
suggestion provider might be able to show suggestions that 搜狗 provides - it
would be awesome,
if it were user's intention.  But 's' or 'so'  is usually too short to give
a specific candidate.  I
tested with plain MS pinyin on windows8, after typing 'sog', "搜狗" was 2nd
after "送给".
If the user typed "搜狗" in the past and IME can suggest it with only 's' -
it is nice, but
it implies privacy concern if such history is exposed to the web.

For Japanese, typically Japanese IMEs have 2 modes during composition, one
for
typing composition text (in romanized form to compose reading in
'Hiragana') and
then the other for converting the reading into Kanji and Kana mixed text.
 IMEs
can provide candidate suggestions during the first mode, but (although this
is implementation dependent) some IMEs cannot expose such suggestions to
applications on Windows (e.g. Google Japanese Input).  For suggestions use
case,
such suggestions would be more valuable, but not available technically.
 Once
a user starts conversion after having typed reading, then candidates become
available,
but it may be too late for this use case.


It got a bit long but this is my current thought about the usefulness of
getCompositionAlternatives()
for suggestion use case.  Overall,my gut feeling is that it works only on
some cases,
but will not work well on most cases.

I'd appreciate you have counter examples :)

Thanks!
-- 
Takayoshi Kochi
Received on Friday, 20 December 2013 03:55:18 UTC