Re: Composition, IME, etc. from Ryosuke Niwa on 2014-06-30 (public-editing-tf@w3.org from June 2014)

From: Ryosuke Niwa <rniwa@apple.com>
Date: Mon, 30 Jun 2014 16:30:00 -0700
To: Robin Berjon <robin@w3.org>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>, public-editing-tf@w3.org
Message-id: <1D06B712-6E8F-4BC6-BB4A-320A8B2FDCA5@apple.com>
On Jun 23, 2014, at 8:45 AM, Robin Berjon <robin@w3.org> wrote:

> On 06/06/2014 19:13 , Ryosuke Niwa wrote:
>> On Jun 6, 2014, at 7:24 AM, Robin Berjon <robin@w3.org> wrote:
>>> In order to handle them you have two basic options:
>>> 
>>> a) Let the browser handle them for you (possibly calling up some
>>> platform functionality). This works as closely to user expectations
>>> as a Web app can hope to get but how do you render it? If it
>>> touches your DOM then you lose the indirection you need for
>>> sensible editing; if it doesn't I don't know how you show it.
>>> 
>>> b) Provide the app with enough information to do the right thing.
>>> This gives you the indirection, but "doing the right thing" can be
>>> pretty hard.
>>> 
>>> I am still leaning towards (b) being the approach to follow, but
>>> I'll admit that that's mostly because I can't see how to make (a)
>>> actually work. If (b) is the way, then we need to make sure that
>>> it's not so hard that everyone gets it wrong as soon as the input
>>> is anything other than basic English.
>> 
>> I'm not convinced b is the right approach.
> 
> As I said though, it's better than (a) which is largely unusable.
> 
> That said, I have a proposal that improves on (b) and I believes addresses your concerns (essentially by merging both approaches into a single one).
> 
>>> If the browser doesn't know because the platform can't tell the
>>> difference between Korean and Japanese (a problem with which
>>> Unicode doesn't help) then there really isn't much that we can do
>>> to help the Web app.
>> 
>> This predicates on using approach b.  I'm not convinced that that's
>> the right thing to do here.
> 
> No, it doesn't. If the browser has no clue whatsoever how to present composition then it can't offer the right UI itself any more than it can help the application do things well. I am merely ruling that situation, which you mentioned, out as unsolvable (by us).
> 
>>> However if the browser knows, it can provide the app with
>>> information. I don't have enough expertise to know how much
>>> information it needs to convey — if it's mostly style that can be
>>> done (it might be unwieldy to handle but we can look at it).
>> 
>> The problem here is that we don't know if underlining is the only
>> difference input methods ever need.  We could imagine future new UI
>> paradigms would require other styling such as bolding text, enlarging
>> the text for easier readability while typing, etc...
> 
> I never said that the browser would only provide underlining information. I said it can convey *style*. If it knows that the specific composition being carried out requires bolding, then it could provide the matching CSS declaration. If there is an alien composition method that requires red blinking with a green top border, it could convey that.
> 
> Having said that, having the browser convey style information to the script with the expectation that the script would create the correct Range for the composition in progress and apply that style to it, even though possible, seems like a lot of hoops to jump through that are essentially guaranteed to be exactly the same in every single instance.
> 
> I think we can do better. It's a complicated-sounding solution but the problem is itself complex, and I *think* that it is doable and the best of all options I can think of.
> 
> To restate the problem:
> 
>  • We don't want the browser editing the DOM directly because that just creates madness
>  • We want to enable any manner of text composition, from a broad array of options, while showing the best UI for the user.
> 
> These two requirements are at odds because rich, powerful composition that is great for the user *has* to rely on the browser, but the logical way for the browser to expose that is to use the DOM.
> 
> The idea to ally both is to use a "shadow text insertion point". Basically, it is a small DOM tree injected as a shadow at the insertion point (with author styles applied to it). The browser can do *anything* it wants in there in order to create a correct editing UI. While composition is ongoing, the script still receives composition events but can safely just ignore them for the vast majority of cases (since you can't generally usefully validate composition in progress anyway). When the composition terminates, the input event contains the *text* content of the shadow DOM, which is reclaimed.

That's an interesting idea. It does works around the issue of UA having to draw the composting text while still allowing authors to style it.

> I guess that the shadow text insertion point would participate in the tree in the same way that a pseudo-element does. (Yes, I realise this basically means "magic".)
> 
> I believe this works well for the insertion of new text; I need to mull it over further to think about editing existing content (notably the case that happens in autocorrect, predictive, and I believe Kotoeri where you place a cursor mid-word and it will take into account what's before it but not after).

You can reverse convert composed text as well.  e.g. select & right click on "今日" in TextEdit on Mac or Notepad on Windows with respective Japanese IMEs, and they allow you to convert them to back to compositing state "きょう", and you can keep typing more text thereafter.

> But I think it's worth giving it some thought; particularly because I don't see how we can solve this problem properly otherwise.
> 
> This has the advantage that it is also a lot simpler to handle for authors.

Yeah, this is a very interesting approach.  We should come up with a list of use cases for IME integration and see if this approach can satisfy most of them.

- R. Niwa
Received on Monday, 30 June 2014 23:30:39 UTC