Re: [svgwg] Character counting in text 'x', 'y', 'dx', 'dy', and 'rotate' attributes. (#537) from CSS Meeting Bot via GitHub on 2019-09-19 (public-svg-issues@w3.org from September 2019)

From: CSS Meeting Bot via GitHub <sysbot+gh@w3.org>
Date: Thu, 19 Sep 2019 09:07:45 +0000
To: public-svg-issues@w3.org
Message-ID: <issue_comment.created-533040447-1568884063-sysbot+gh@w3.org>
The SVG Working Group just discussed `Character counting in text attributes`, and agreed to the following:

* `RESOLVED: Do not change the previous resolution.`

<details><summary>The full IRC log of that discussion</summary>
&lt;mstange> Topic: Character counting in text attributes<br>
&lt;AmeliaBR> github: https://github.com/w3c/svgwg/issues/537<br>
&lt;mstange> AmeliaBR: To recap: text attributes for positioning, as we've been discussing, can have multiple values that can be applied to multiple characters.<br>
&lt;mstange> AmeliaBR: The first step is that we look at the attribute and assign each value to a different DOM character. In some cases that's very simple.<br>
&lt;mstange> ... But if you have more complex multi-byte characters, things get more confusing.<br>
&lt;mstange> s/characters,/characters or clusters,/<br>
&lt;mstange> AmeliaBR: "What is a character" becomes a debate.<br>
&lt;mstange> ... There are other definitions that use utf-16 blocks, which is not very useful. But beyond that, do you use unicode codepoints or do you combine and cluster things so that you have a combining accent character, are those the same character or different characters?<br>
&lt;mstange> ... We have a resolution from January which resolved that values in the array should be assigned based on unicode codepoints.<br>
&lt;mstange> ... The argument for that is that unicode codepoints are stable and won't be affected by whether a new cluster gets introduced, or whether a particular font supports a particular combining unit.<br>
&lt;mstange> myles: Other part of the resolution: We count based on code points, but we don't segment based on code points.<br>
&lt;mstange> myles: Let's say we have the string of code units "A" "B" "heart emoji" "red combining character". Now we also have an array of positioning values with four elements.<br>
&lt;mstange> ... Now we need to come up with a mapping. There's to parts to this resolution: When you count, you count code point by code point. And the second part is: You're allowed to disregard any positions assigned to any combining characters, because the combining characters don't get rendered on their own.<br>
&lt;mstange> ... We didn't want to have a situation where regular characters following a combining character end up in the wrong position because they get assigned the wrong value from the position array.<br>
&lt;mstange> ... This ensures consistency between browsers.<br>
&lt;mstange> AmeliaBR: It's not intuitive for hand-authoring. But the upside is that, outside of browser differences of the graphing cluster, the rest of the layout stays consistent from browser to browser. Once you get past the cluster that has the discrepancy, everything else is the same.<br>
&lt;mstange> nmccully: Let's say there is a browser with a shaping engine and one that is not. The browser that has a shaping engine will &lt;missed>. The browser that *doesn't* have a shaping engine will presumably manually get positioning information and the combining red would be passed by itself to a table that gives a space.<br>
&lt;mstange> AmeliaBR: There is a multi-stage lining up process. The way we're proposing is: positioning values to code points is a one-to-one matching. The next step (matching code points to your shaping) is where things can be discarded because of ligatures or combinations etc.<br>
&lt;mstange> r12a: There are two issues. One, e.g. the word réd can have two code point representations, and an author cannot immediately see which one is used.<br>
&lt;mstange> heycam: If the positional values are coming from a graphical editor, the graphical editor knows what code points are used and can generate the correct arrays.<br>
&lt;mstange> AmeliaBR: Yes, it's only the hand authoring case that's hard.<br>
&lt;mstange> myles: Alternatively, we could specify that &lt;missed> goes through normalization.<br>
&lt;mstange> r12a: I do not like the idea of normalizing my content. Sometimes I want things to be composed and put the accents afterwards.<br>
&lt;mstange> myles: Pragmatic / performance, might not be worth it.<br>
&lt;mstange> r12a: The other issue is that, for example if you have some Persian, sometimes a zero-width joiner is used at the end of a word to produce the right shape. Then you need the two characters to stay together.<br>
&lt;mstange> ... and those are not a graphing cluster.<br>
&lt;mstange> heycam: You would ignore the positioning value for the zero-width joiner.<br>
&lt;mstange> AmeliaBR: If you're intentionally putting the ZWJ &lt;missed>, you still want contextual glyph selection.<br>
&lt;AmeliaBR> s/&lt;missed>/to change to a medial glyph/<br>
&lt;mstange> heycam: CSS properties can change the effect of ligatures and other combinations, and you wouldn't want to have to adjust your positioning value array based on that.<br>
&lt;AmeliaBR> s/&lt;missed> goes/the text content/<br>
&lt;mstange> nmccully: If you need backward compatibility with engines that don't understand the red combining thing, &lt;missed><br>
&lt;mstange> myles: A lot of people said "there is incompatibility and we have to deal with it."<br>
&lt;AmeliaBR> s/engine will &lt;missed>/engine will get fewer glyph clusters from the engine than there are characters/<br>
&lt;mstange> nmccully: Are you protecting from a malformed SVG from a bad player?<br>
&lt;mstange> nmccully: Different browsers might get different results for cluster segmentation. Users want to get consistent positions everywhere. So we can't count positioning values based on the results from cluster segmentation. So we have to do the matching based on something in the source.<br>
&lt;heycam> mstange: seems like whenever you have parallel arrays, you should just have a single array of pairs<br>
&lt;heycam> ... why is this API necessary?<br>
&lt;mstange> AmeliaBR / myles: There is existing content that uses it.<br>
&lt;mstange> r12a: In Indic scripts you have conjuncts, you split a syllable at a time.<br>
&lt;mstange> ... (shows an example that has two graphing clusters that combine into one visible unit)<br>
&lt;mstange> myles: This is why we didn't want to specify what segmentation to use.<br>
&lt;AmeliaBR> s/graphing clusters/grapheme clusters/g<br>
&lt;mstange> AmeliaBR: r12a, are you ok with keeping the existing resolution?<br>
&lt;mstange> r12a: It's not pretty, but I understand why it's there, and I can't figure out anything better.<br>
&lt;mstange> r12a: (shows a testcase that renders differently in Firefox and Chrome, where diagonal Arabic text is rendered top-right to bottom-left in Firefox and top-left to bottom-right in Chrome)<br>
&lt;mstange> RESOLVED: Do not change the previous resolution.<br>
&lt;AmeliaBR> RRSAgent, make minutes<br>
&lt;RRSAgent> I have made the request to generate https://www.w3.org/2019/09/18-svg-minutes.html AmeliaBR<br>
&lt;prushforth> CG meeting is on channel #svgcg<br>
</details>


-- 
GitHub Notification of comment by css-meeting-bot
Please view or discuss this issue at https://github.com/w3c/svgwg/issues/537#issuecomment-533040447 using your GitHub account
Received on Thursday, 19 September 2019 09:07:46 UTC