- From: 河内 隆仁 <kochi@google.com>
- Date: Fri, 20 Dec 2013 12:54:30 +0900
- To: SongTao(桌面事业部) <SongTao@sogou-inc.com>
- Cc: "public-webapps@w3.org" <public-webapps@w3.org>
- Message-ID: <CADP2=hoChAhbSHJ4ad69aj46GJy8Z_J4hPTesH1fZHLychAJ9A@mail.gmail.com>
Hi, On Thu, Dec 19, 2013 at 7:23 PM, Takayoshi Kochi (河内 隆仁) <kochi@google.com>wrote: > >> FYI, this is (still) being discussed at > https://www.w3.org/Bugs/Public/show_bug.cgi?id=22018 > And your suggestion is almost same as the reason Microsoft people proposed > this. > > One of the issue blocking this is a privacy issue, which may leak a > personalized dictionary > (user-defined dictionary or history). > For the suggestion use case for using getCompositionAlternatives(), I'm still not really convinced that it could be useful. As Microsoft implemented<http://msdn.microsoft.com/en-us/library/ie/dn433251(v=vs.85).aspx>it in IE11, I'd be interested if it improved quality of Bing suggestion :) One reason is that for suggestion use case is that it is questionable that giving alternate candidates will yield better suggestions in general. A simple case like yours (sogou -> 搜狗) may look working very well, though. As the area for the suggestion is usually very limited, a suggestion provider should give *really* probable candidates for users. So usually if composition has candidates, at best only top one or two could be *good* for hinting suggestions, when they are sorted by relevancy. I'd wish there were an API that exposes relevancy/confidence/whatever score for each candidate for the composition, which an IME should have internally, then it could be useful for this use case. E.g. you can imagine a case where there are 4 candidates, and their scores are all even (25, 25, 25, 25) or the top has a very high score (96, 2, 1, 1). It is less likely that the former yields better suggestion than the latter, if a suggestion provider picks the top candidate. If a suggestion provider can get such information, it can even drop any suggestions for the all-even case, as it may not be sufficient input for generating suggestions. One solution for this which can be possible today is to have IME on the server side, and browsers send 'raw' text (before conversion) to the server, then server can interact with its local IME and retrieve such score information. Then getCompositionAlternatives() is unnecessary. The other reason is that it is questionable that how much such suggestions could save users' typing (or time). For the reason above, unless a suggestion provider can give really probable suggestions, users won't get any benefit, or worse, users just get visual noise for distraction. For saving typing, IMEs should provide candidates as early as possible, by predicting user's typing. For example, when you type 's' and IME could provide '搜狗' as a candidate, then suggestion provider might be able to show suggestions that 搜狗 provides - it would be awesome, if it were user's intention. But 's' or 'so' is usually too short to give a specific candidate. I tested with plain MS pinyin on windows8, after typing 'sog', "搜狗" was 2nd after "送给". If the user typed "搜狗" in the past and IME can suggest it with only 's' - it is nice, but it implies privacy concern if such history is exposed to the web. For Japanese, typically Japanese IMEs have 2 modes during composition, one for typing composition text (in romanized form to compose reading in 'Hiragana') and then the other for converting the reading into Kanji and Kana mixed text. IMEs can provide candidate suggestions during the first mode, but (although this is implementation dependent) some IMEs cannot expose such suggestions to applications on Windows (e.g. Google Japanese Input). For suggestions use case, such suggestions would be more valuable, but not available technically. Once a user starts conversion after having typed reading, then candidates become available, but it may be too late for this use case. It got a bit long but this is my current thought about the usefulness of getCompositionAlternatives() for suggestion use case. Overall,my gut feeling is that it works only on some cases, but will not work well on most cases. I'd appreciate you have counter examples :) Thanks! -- Takayoshi Kochi
Received on Friday, 20 December 2013 03:55:18 UTC