Re: [csswg-drafts] [css-fonts-5] Removing font-language-override (#5484)

> Philosophically, there shouldn't be two places authors can specify language to get correct text shaping.

I don't think it's quite as simple as this. The use case for `font-language-override` arises because of a mismatch between different things referred to (rather loosely) as "language". In HTML, authors can tag content with the `lang` attribute, normally thought of as "language" although it can carry additional subtags such as script and region, so it's really a locale identifier.

When it comes to text shaping, however, the functionality in OpenType fonts is driven via tags that are often referred to as "language", but are more formally called "language system" tags. This is not at all the same thing.

Quoting from https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags (emphasis added):

> Language system tags identify the language systems supported in a OpenType Layout font. **What is meant by a “language system” in this context is a set of typographic conventions for how text in a given script should be presented.** Such conventions may be associated with particular languages, with particular genres of usage, with different publications, and other such factors. For example, particular glyph variants for certain characters may be required for particular languages, or for phonetic transcription or mathematical notation.

The OpenType tag is about *a set of typographic conventions*, not directly about *language* (although it is often possible to infer a reasonable default mapping from one to the other).

> In principle, a given set of conventions may be shared across multiple scenarios. For instance, two different languages (perhaps unrelated) may happen to follow the same conventions. Language system tags can be registered on a perceived-need basis, however; as a result, there is no guarantee that each tag represents a distinct and unique set of conventions. Tags can, however, be registered with the intent of representing conventions that apply to multiple languages. In such cases, the documented description for the tag should reflect that intent.

> It should also be noted that there may be more than one set of typographic conventions that apply to a given language.
Therefore, in several respects, **language system tags do not correspond in a one-to-one manner with languages**. Even so, many registered tags are intended to represent typographic conventions for a particular language. For cases in which a correlation exists between a tag and one or more languages, the language identities are documented here by reference to ISO 639-2 and ISO 639-3.

While many such correlations are documented, there is no claim to completeness, and given the complexity (and ever-evolving conventions) of human language and writing systems, it would be futile to expect it.

> If information is available to an application declaring the language of text content, then the application may make use of that to select a default language system tag to be applied when displaying that text. **It is preferable, however, to give users control over the choice of language system tag to be used.** (Depending on the application scenario, such control may be given to content authors, to content readers, or to both.)

`font-language-override` exists precisely to *give users control* here, as recommended by the OpenType spec, recognizing that (a) it is impossible for a browser to correctly anticipate *every* mapping from language, as expressed in the HTML `lang` tag, to desired writing system conventions as expressed via OT language system; and (b) to require authors to artificially *change* the `lang` tag in order to access desired writing system conventions in a font would be actively harmful.

For example, the OT tag registry includes 5 different tags for Karen languages: `BLK`, `KJP`, `KRN`, `KSW`, `PWO`. An advanced Burmese font might support all 5 of these, with certain differences in glyphs and shaping behavior. However, there are [many more than 5 languages and dialects within the Karen group](https://en.wikipedia.org/wiki/Karenic_languages), and in some cases writing conventions may not even be well-established or documented yet. An author should *not* have to mislabel content with the `lang` tag of one of the major Karen languages just to access their preferred rendering behavior. `font-language-override` allows content to be given an accurate `lang` tag, and *separately* allows the author to choose the desired rendering behavior when a font provides multiple options.

So I am opposed to dropping this. Yes, it's a niche use case, but it is a valid one; I strongly disagree with labelling it "undesirable".

-- 
GitHub Notification of comment by jfkthame
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/5484#issuecomment-687020878 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 4 September 2020 08:59:16 UTC