Re: [csswg-drafts] [css-text-3] Segment Break Transformation Rules for East Asian Width property of A (#337) from CSS Meeting Bot via GitHub on 2020-01-24 (public-css-archive@w3.org from January 2020)

From: CSS Meeting Bot via GitHub <sysbot+gh@w3.org>
Date: Fri, 24 Jan 2020 15:34:51 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-578180069-1579880089-sysbot+gh@w3.org>
The CSS Working Group just discussed `segment-break rules`.

<details><summary>The full IRC log of that discussion</summary>
&lt;TabAtkins> Topic: segment-break rules<br>
&lt;astearns> github: https://github.com/w3c/csswg-drafts/issues/337<br>
&lt;TabAtkins> fantasai: Close it somehow...?<br>
&lt;TabAtkins> myles: I think this is worth some discussion.<br>
&lt;TabAtkins> astearns: Did you find anyone at Apple to talk to?<br>
&lt;fantasai> Section under discussion https://drafts.csswg.org/css-text-3/#line-break-transform<br>
&lt;TabAtkins> myles: I started a discussion; same story happened, we tried to describe it to Ken and then he had no opinions, same thing happened with me.<br>
&lt;TabAtkins> myles: So in light of that I'm willing to somewhat amend my previous position<br>
&lt;TabAtkins> myles: The spec lists a collection of segment-break rules, writing-system rules, general category rules, and the word "hangul"...<br>
&lt;TabAtkins> myles: I'd like the criteria of this to be listed somewhere that isn't CSS.<br>
&lt;TabAtkins> myles: I'd ultimately like this to go into Unicode somehow.<br>
&lt;TabAtkins> myles: Ultimately I don't think browsers should be in the business of making these sorts of character decisions.<br>
&lt;TabAtkins> myles: If we can do that, I'm willing to accept it.<br>
&lt;TabAtkins> florian: I don't have a problem with that in theory.<br>
&lt;TabAtkins> florian: To the extent I've tried to discuss this with unicode, I didn't sense any interest on their side that this is a problem worth solving.<br>
&lt;TabAtkins> florian: Or maybe not even a willingness to understand the problem.<br>
&lt;TabAtkins> florian: If we were doing codepoint-by-codepoint I'd be concerned, but this is category based.<br>
&lt;TabAtkins> myles: I'm an implementor here; we have different ideas about "complicated".<br>
&lt;TabAtkins> myles: Also their lack of interest is a signal. We're not the only language that uses text.<br>
&lt;TabAtkins> fantasai: We're one of the only that takes broken lines and unbreaks them.<br>
&lt;TabAtkins> myles: Unicode has taken on work to describe all the linebreaking in CSS. So if they don't care abuot this, that's a signal!<br>
&lt;TabAtkins> fantasai: They have no spec for line unbreaking.<br>
&lt;TabAtkins> koji: We've tried to combine multiple proeprties, the WG rejected the idea, unicode started the spec for CSS. So I think I agree with Myles; we either convince Unicode, or stick with what we had before and not combine multiple proeprties.<br>
&lt;TabAtkins> astearns: What's the current state of the spec?<br>
&lt;TabAtkins> fantasai: There's a bunch of rules in the spec based around East-Asign Width property and General category.<br>
&lt;TabAtkins> fantasai: Started with EAW, made an exception for Hangul because it's wide, and I think that's what's implemented in Gecko right now.<br>
&lt;TabAtkins> fantasai: This issue was opened on "I want you to handle ambiguous characters better"<br>
&lt;TabAtkins> fantasai: So in response we added "if the ambiguous character is in a context we know is wide, like Chinese, treat it as wide; otherwise as narrow".<br>
&lt;TabAtkins> fantasai: Then Unicode redefined some characters that were previously narrow/ambiguous into wide, because of emoji.<br>
&lt;TabAtkins> fantasai: Then we reopened the issue to treat emojis as ambiguous.<br>
&lt;TabAtkins> florian: When we complained to Unicode about that change, they said this property is for terminal rendering, nobody should use it.<br>
&lt;TabAtkins> koji: I agree the emoji issue is bad.<br>
&lt;TabAtkins> koji: So my preference is from before, take behavior based on encoded block. That might be slightly less accurate than your current proposal, but as long as it's consistent across browsers authors will be happy enough.<br>
&lt;TabAtkins> fantasai: So instead of using EAW/General properties (or others), we should evaluate unicode blocks and declare how to treat each?<br>
&lt;TabAtkins> [discussion of how unicode blocks work]<br>
&lt;TabAtkins> fantasai: That would probably work.<br>
&lt;TabAtkins> astearns: That sounds great.<br>
&lt;TabAtkins> myles: So somebody in this group, not me, should come up with a list of blocks. If it's very large, we can revisit, but if it's small, then ok.<br>
&lt;TabAtkins> myles: My criteria here is maintainability.<br>
&lt;TabAtkins> fantasai: I can take that action.<br>
&lt;TabAtkins> myles: Ok, we can discuss it then.<br>
&lt;TabAtkins> jfkthame: For maintainability, you'd have to recheck each version, approximately yearly.<br>
&lt;TabAtkins> fantasai: Sure. General criteria is like "if it's more than 80% han characters, it's on the list", easy.<br>
&lt;TabAtkins> koji: As the editor of UAX 50, I'm doing that every year. We will assume VerticalOrientation to U for CJK characters, you can check that.<br>
&lt;dbaron> $ grep ';' Blocks.txt | grep -v "^#" | wc -l<br>
&lt;dbaron> 291<br>
&lt;TabAtkins> fantasai: The set of chars we want here are pretty much exactly Chinese and Japanese characters.<br>
&lt;TabAtkins> jfkthame: Base it on Script, then?<br>
&lt;TabAtkins> fantasai: We do that today, but we have to remove punctuation, etc, thus the current complexity.<br>
&lt;fantasai> s/remove/add/<br>
&lt;TabAtkins> astearns: So proposal is fantasai looks at the blocks, and comes back later.<br>
&lt;TabAtkins> myles: Parting word, text started elegant, got full of exceptions over time. If that happens again, we should just cut it off.<br>
</details>


-- 
GitHub Notification of comment by css-meeting-bot
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/337#issuecomment-578180069 using your GitHub account
Received on Friday, 24 January 2020 15:34:52 UTC