Re: [w3c/uievents] Clarify `keypress` event handling for keys that map to non-BMP Unicode symbols (Issue #346)

> It is explicitly stated that the key code is given as the unicode code point (or 0). See https://www.w3.org/TR/uievents/#determine-keypress-keyCode.

Note that that entire section is non-normative. We do not intend to normatively specify `keypress` or the deprecated `keyCode` and `keyChar` attributes, although we can certainly add implementation notes.

> I meant that this specific statement wasn't stated, but this statement is a given since a broken surrogate pair is not a unicode code point.

From [unicode.org](https://unicode.org/faq/utf_bom.html#utf16-2):

> Surrogates are [code points](https://www.unicode.org/glossary/#code_point) from two special ranges of Unicode values, reserved for use as the leading, and trailing values of paired [code units](https://www.unicode.org/glossary/#code_unit) in [UTF-16](https://www.unicode.org/glossary/#UTF_16).

So sending a single surrogate code point is technically valid according to the current text of the spec. Allowing a Unicode character from 2 surrogate pairs would require the spec to be re-worded.

> Add explicit wording to input.data's specification regarding whether implementations MUST, or SHOULD, or needn't, ensure to deliver non-BMP characters whole, or whether events with input.data containing individual surrogates are acceptable.

The spec is actually clear on this. The [data](https://www.w3.org/TR/uievents/#dom-inputevent-data) attribute is a [DOMString](https://webidl.spec.whatwg.org/#idl-DOMString), which usually permits unmatched surrogate pairs, but the text in the spec states it should only contain Unicode characters (so maybe the attribute should instead be defined as a [USVString](https://webidl.spec.whatwg.org/#idl-USVString)). Based on this, Firefox is not correct to include unmatched surrogates.

From my perspective, the primary problem here is when 2 separate event sequences are sent when the user enters a single character. I think this is unexpected and undesirable. In the Firefox example, I get the sense that the main reason for sending multiple `input` (and other) events so that it can set the `keyCode` attribute correctly for each surrogate half.

In the examples above, I think that Safari and Chrome are both doing appropriate things (except that Chrome is not setting the `key` attribute of the `keydown`/`keyup` properly).

Here are my high-level thoughts on this:
* Only one event sequence (`beforeinput`, `input`) should be sent in response to the user selecting one character.
* The UIEvent `data` attribute might be better defined as a `USVString`, but we are explicit in the text, so I'm not sure if this is worthwhile.

To fix things, I believe the key changes (Firefox/Chrome) needed are:
* Fix it so that unmatched surrogates are not included in the UIEvent `data` field (to match the current spec).
* Only send `beforeinput` and `input` events once for emoji (and other surrogates)

To support these fixes, we might need minor spec updates based on how Firefox/Chromium choose to approach this. For example, we could consider any of the following:
* Update (non-normative) spec text to redefine `keyChar` to allow Unicode characters (instead of just code points).
* Add a note that the `keyChar` attribute might not handle surrogates properly (only having the first or last half, for example)
* State that multiple `keypress` events can happen for surrogates.
* ... (something else)

Note that anything we say in the spec regarding `keypress` and `keyChar` will be informational (ie: non-normative). I don't have strong opinions here about these different approaches.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3c/uievents/issues/346#issuecomment-1601750437
You are receiving this because you are subscribed to this thread.

Message ID: <w3c/uievents/issues/346/1601750437@github.com>

Received on Wednesday, 21 June 2023 22:15:48 UTC