Re: [csswg-drafts] [css-text-3] Segment Break Transformation Rules for East Asian Width property of A (#337)

> For what it's worth we've just implemented this as currently specified, and it makes a real mess of some tests, e.g. CSS2/generated-content/content-counter-004-ref.xht - spaces between U+25FE (black square) are removed, due to the EAW property being "W".

This is because Unicode changed the EAW of a lot of characters in an effectively random and backwards-incompatible way when it introduced Emoji. The results based on e.g. [Unicode 6](https://www.unicode.org/Public/6.1.0/ucd/EastAsianWidth.txt), when these rules were written, would have been quite sensible. :/ Trying to compensate for this change is one of the reasons the rules became too complicated...

I've committed an initial draft of the Unicode block-based approach. I think the interesting questions remaining are:
- Bopomofo
- Yijing Hexagram Symbols / Tai Xuan Jing Symbols / Counting Rod Numerals
- Enclosed ideographics

I'm leaning towards yes on enclosed ideographics, no on the symbols, and I don't know enough about Bopomofo when it is used as a stand-alone script to say.

Lisu and Khitan both use spaces; they should not therefore discard them during collapsing. Small forms etc. are primarily used with Chinese and Japanese, not Korean, so I think it's reasonable to include them here. (Keep in mind also that both sides of the break need to belong to the set in order to discard, and Hangul is excluded.)

-- 
GitHub Notification of comment by fantasai
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/337#issuecomment-612686124 using your GitHub account

Received on Sunday, 12 April 2020 22:31:28 UTC