Re: [csswg-drafts] [css-text-3] Segment Break Transformation Rules for East Asian Width property of A (#337) from Dr. Ken Lunde via GitHub on 2018-12-19 (public-css-archive@w3.org from December 2018)

From: Dr. Ken Lunde via GitHub <sysbot+gh@w3.org>
Date: Wed, 19 Dec 2018 21:36:33 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-448752350-1545255392-sysbot+gh@w3.org>

I felt the hairs on the back of my neck tingle. 🥃

I recently conveyed to @fantasai and @frivoal in a private exchange that the UTC is extremely reluctant to make changes to EAW, and the latest substantive change was to add a note at the end of [Section 2, _Scope_](https://www.unicode.org/reports/tr11/#Scope), in hopes that it would discourage change requests:

> The East_Asian_Width property is not intended for use by modern terminal emulators without appropriate tailoring on a case-by-case basis. Such terminal emulators need a way to resolve the halfwidth/fullwidth dichotomy that is necessary for such environments, but the East_Asian_Width property does not provide an off-the-shelf solution for all situations. The growing repertoire of the Unicode Standard has long exceeded the bounds of East Asian legacy character encodings, and terminal emulations often need to be customized to support edge cases and for changes in typographical behavior over time.

What is being discussed here is sufficiently different than what is conveyed in the note that is quoted above, but the premise remains the same.

Prior to that, characters that have the _Emoji_Presentation_ property were changed to EAW=W, along with a note about treating _emoji presentation sequences_ as EAW=W (because the property deals in characters, not sequences). Keep in mind that characters that fall into _emoji presentation sequences_ are ambiguous as to their emoji presentation, and require (according to Unicode) an explicit Variation Selector, VS16 (U+FE0F), to indicate emoji presentation. Without an explicit Variation Selector, the EAW property value for such characters is ambiguous without drawing on one or more other properties, or through tailoring.

EAW is about resolving character width as a binary condition—sometimes necessarily via tailoring—in terms of whether to treat a particular character as half-width or full-width in the context of East Asian text processing. It really has nothing to do with the treatment of spaces, which makes me feel very uneasy about the use of this property in such a context. Ignoring the 800-pound gorilla that is represented by the CJK Unified Ideographs blocks, an extraordinary large number of characters are completely outside the scope of East Asian text, either because they are completely unrelated scripts or fall outside of the half-width/full-width paradigm.

Two other statements in the same section of EAW should be considered (emphasis mine):

> It does not provide rules or specifications of how this property might be used in font design or **line layout**, because, while a useful property for this purpose, it is only one of several character properties that would need to be considered.

> Instead, the guidelines on use of this property should be considered recommendations based on a particular legacy practice **that may be overridden by implementations as necessary**.

Anyway, I have asked a couple of Unicode experts who are better versed than me about properties related to segmentation and line breaking in hopes that they can offer solutions that don't involve EAW. The holidays are upon us, so there is likely to be delays in getting substantive or helpful feedback.

-- 
GitHub Notification of comment by kenlunde
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/337#issuecomment-448752350 using your GitHub account

Received on Wednesday, 19 December 2018 21:36:34 UTC