Re: [csswg-drafts] [css-fonts] Handling of Standardized Variation Sequences

I will definitely defer to your expertise. I just have a couple more questions as to the use case of the proposal; my apologies.

> > The feature would, for instance, would give copying web text with variation selectors unexpectedly differ, when pasted into some destination (as well as when the text is interpreted by crawlers and other plain-text-processing applications), from how it is visually rendered by a web browser.
>
> That happens with text-transform: uppercase too.

This is indeed true. 

In this case indeed, the author of the stylesheet risks modifying the fundamental plain-text semantics intended by the author of the text content—but this risk is generally low, at least for Latin scripts. But the risks of text transformation increase as the significance of characters increase.

> > The author therefore expects their ideographs to always render with that specific style, as long as that style is able to be rendered.
>
> Not exactly. In China some users will prefer a more accurate representation of the text but more will prefer the characters to be rendered in the modern orthography. For many digitization projects that means building two different strings, one often using PUA (because ideographic variation selectors are not well supported).
> 
> The point is, Han variants could both indicate gloss and could indicate semantic intention. The person doing the digitization may not be able to make a good guess - so they have to preserve it in some way and provide a client side toggle - in a way like choosing between serif and sans-serif in a browser's reading mode.

What I’m wondering about here is the desires of text authors versus the desires of stylesheet authors for websites that display the text.

When a stylesheet author writes `text-transform: uppercase;`, the stylesheet author risks modifying the fundamental plain-text semantics intended by the authors of text content.

This risk is generally low, at least for Latin scripts, which do not change meaning that much depending on case (though there are certain significant exceptions).

However, the risks of text transformation increase as the significance of characters increase. A hypothetical text transformation (from styled Latin mathematical letters to plain Latin characters and from superscript numbers to regular numbers) risks rendering “ℋ = ∫ 𝑑𝜏(𝜖𝐸² + 𝜇𝐻²)” as the quite semantically different “H = ∫ dτ(ϵE^2 + µH2)”. Another hypothetical text transformation (from colored symbols to regular symbols) risks changing “He had difficulty distinguishing 🔴 and 💚” to the semantically different “He had difficulty distinguishing ⚫️ and 🖤”. And a hypothetical text transformation (from ideograph variation sequences to ideographs all of one variation) would have similar risks.

“The current pushback [of Han Unification] is largely based on the inability to accurately reproduce a certain glyph the author intends,” but here “author” presumably refers to the authors of text content, rather than the authors of stylesheets. Ideographic variation sequences already give to text authors the power to precisely control what glyph is rendered on a per-character basis, so I am uncertain why pushback would still be occurring. Yet this proposal would take some of that power away from text authors, by giving stylesheets the ability to override the text author’s intent.

What I’m wondering about here is the conflict between the desires of text authors versus the desires of stylesheet authors for websites that display the text. For whose benefit would this proposal be? As far as I could tell, would only weaken the ability of text authors to have their content be rendered using the correct glyphs, giving more that power to the stylesheet author instead.

> The point is, Han variants could both indicate gloss and could indicate semantic intention. The person doing the digitization may not be able to make a good guess - so they have to preserve it in some way and provide a client side toggle - in a way like choosing between serif and sans-serif in a browser's reading mode.

Basically—why would the person doing the digitization simply not use the right variation selector, instead of relying on rich-text markup? I actually am very interested in this, since all this seems to indeed be a real-world occurrence.

Of note is that [Unicode Technical Report 51: Unicode Emoji recommends that emoji/text variation selectors *always* be respected, regardless of the environment’s default rendering style](https://www.unicode.org/reports/tr51/index.html#Presentation_Style). This is presumably because of the same reason: the very existence of variation selectors as plain text is because they are part of the intrinsic semantics of the plain text.

-- 
GitHub Notification of comment by js-choi
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/1710#issuecomment-371061922 using your GitHub account

Received on Wednesday, 7 March 2018 08:29:47 UTC