Re: High-level goals and objectives of the Editing TF (was Way forward and IME behavior speccing from Johannes Wilm on 2015-10-16 (public-editing-tf@w3.org from October 2015)

From: Johannes Wilm <johanneswilm@gmail.com>
Date: Fri, 16 Oct 2015 16:49:03 +0200
To: Koji Ishii <kojiishi@gmail.com>
Cc: Florian Rivoal <florian@rivoal.net>, Ryosuke Niwa <rniwa@apple.com>, Piotr Koszuliński <p.koszulinski@cksource.com>, "public-editing-tf@w3.org" <public-editing-tf@w3.org>, Takayoshi Kochi <kochi@chromium.org>
Message-ID: <CABkgm-TWFoCCLMzk5W2eF9yAid4K25z1d5ckbbg9SLbdUkpcUw@mail.gmail.com>

On Fri, Oct 16, 2015 at 4:02 PM, Koji Ishii <kojiishi@gmail.com> wrote:

> On Fri, Oct 16, 2015 at 9:23 PM, Florian Rivoal <florian@rivoal.net>
> wrote:

...

>
>
> > As you said, IME is pretty much the same as keyboard input.
> preventDefault
> > is able to cancel regular input from a keyboard. This is not a plan,
> > this is today (see the second example on
> https://developer.mozilla.org/en-US/docs/Web/API/Event/preventDefault).
> > I don't see the argument for being able to  cancel "A" but not "あ",
> > or even to be able to cancel "A" if it is input by keyboard, but not
> > the same "A" if it is input via IME.
>
> Since the original proposal said "all" and "every single", I'm opposed to
> that.
>
> If you have specific list of things you want to cancel, I can discuss
> on each. Canceling committed characters from IME is fine with me. I
> don't read specs, sorry, but I hope we can't cancel Caps Lock, can we?
>

adding characters, deleting characters, deleting/changing/adding DOM
structures.

What other operations are there?

So in total: any operation that changes the DOM in any kind of way due to
IME-input needs to be something the JS can have a final word on. See the
example about ICE I sent over earlier today.

For the compositionstart (at least) it needs to be possible to move the
caret somewhere else. Why is that? Again, look at the ICE-example. In most
cases we can let the browser handle the insertion of new characters, if we
are allowed to first move the caret somewhere else. In fact for most keys
(most of the time), that's how we did it for keyboard input.

So if you have:

"This is a t|st" | = caret

the simplest for the JS is when realizing that a new character is about to
be inserted at thew caret position to create a new new element at the
position and then put the caret there:

"This is a t<ins>|</ins>st"

it can then let the browser insert the "e" at that position. For most
letters we could just do that. But there were some exceptions -- one of
them being the space key -- where one would times decide to exchange it
with a scientific space. Or one may exchange simple double quotation marks
(") with more complex ones (« or ») based on various other factors that
only the JS knows about.

Now if this is currently "impossible" because some IMEs, specifically on
Linux (my own OS of choice), cannot handling browsers canceling IME
composition or moving carets, then I think this may actually be a good
point in time to start defining minimum requirements for IMEs that we can
support, and this should be one of them. The specifically on Linux (my own
OS of choice) old cE will still be with us for a long time, and it will
give IME developers time to update their code to support such minimum
features. Otherwise we end up with IMEs being really badly supported on the
web forever.

The same applies to the movement of the caret during compositionstart.
Clearly this works for a lot of IMEs. And it seems like a much needed
requirement we could ask of the IMEs to implement if they want to be
compatible.

Another problem is that the IME glue code on iOS/Android apparently changes
the DOM structure due to IME on words that go across DOM elements. As I
understood Piotr, on iOS it changes it IF the text is being changed. On
Android it changes the DOM structure even if the user just touches a word
and doesn't do any change.

If you think about the above example, it will give us:

"This is a t<ins>e</ins>st"

Now if I look at this on Android, and I touch the word "test", it will turn
it into:

"This is a test"

That is not OK. So either we need to have a way to tell it to only consider
the current text node, or just a specific range, etc. when copying the
characters into the IME. An alternative could be to sandbox the word
somehow and then have some more complex handling of it all once the
composition of the word "test" has been finished entirely. I have played
around a bit with the standard IME in my Android phone and it guesses
correctly what a word is most of the time, but certainly not always. In
general, for European languages, it seems much more problematic when it
guesses for a "too large" word then if it guesses too little, especially if
you think about things such as footnote markers, citation markers, and
other extra information you may have in the body text that the browser
doesn't really understand.

-- 
Johannes Wilm
http://www.johanneswilm.org

Received on Friday, 16 October 2015 14:49:33 UTC