- From: Andrew Cunningham <lang.support@gmail.com>
- Date: Sat, 9 Jul 2016 17:56:31 +1000
- To: Behdad Esfahbod <behdad.esfahbod@gmail.com>
- Cc: www-style <www-style@w3.org>, John Hudson <tiro@tiro.com>
- Message-ID: <CAGJ7U-XoKBEgYwhXjyhpjKCHXguT3mCdCqb2kcJFZGXywAb16w@mail.gmail.com>
Hi Behdad,
I am not sure how to respond to your email. I assume you deliberately
resent that comment? I have been sitting on this email all day, pondering
the contents. my concern is the ability to use lesser used and minority
languages on the  internet. There are some browsers that are more suited to
this than others.
John please correct any misconceptions of mine on OpenType fonts.
I am aware you believe that language tagging is sufficient and are
resistant to implementing font-language-override. Fair enough.
Assuming lang / xml:lang is the preferred approach, we need a cross browser
approach and normative requirements. Currently each browser does fairly
different things with the language tags and how they match up to OT
language systems. This is complicated by the fact that some browsers only
support opentype while other browsers support additional font technologies.
One issue is the limited number of OT language system tags, and what seems
to be accidental, haphazard approach to adding them. Maybe it is better to
describe OT language system tags in terms of evolution, growing and
refining over time. OT language system tags were never developed in a
systematic way. And it is probably that over time more will need to be
added. So locl support via language tags will always be a moving target.
Second issue is the poor mapping between language tags and OT language
system tags. Documents like
https://www.microsoft.com/typography/otspec/languagetags.htm are far from
perfect and require more work.
For instance the OT language tag DNK is mapped to the language tag "din".
This is a ISO-639-2 tag, and represents a macro-language. In theory all the
ISO-639-3 language tags encompassed by it should be listed as well, but
they are absent form this table. So DNK should strictly map to din, dip,
diw, dib, dks and dik. Which the copy of the OT spec on Microsoft's site
does not reflect.
Many African languages have specific glyph and diacritic placement
requirements that may differ from other languages. Concentrating on DNK for
a moment and limiting myself to a discussion of Sudanese and South Sudanese
languages of which Dinka is one: most other languages from these countries
are not represented as OT language system tags. I only have a partial
collection of orthography statements for the languages of these countries,
maybe one fifth, maybe less. But going through what I have at hand I
identified the following language tags that have similar requirements as
Dinka: ava, bfa, bxb, krs, bex, mfz, mor, mgd, mur, nus, lot, ddd, keo and
mqu.
One option is to greatly expand the number of OT language system tags or
alternatively to map these other languages to DNK. John Hudson may correct
me, but my understanding is that the OT language system tags were intended
to represent shared typographic traditions rather than representing
languages per se, so within the context of the OT specifications it would
be logical to map va, bfa, bxb, krs, bex, mfz, mor, mgd, mur, nus, lot,
ddd, keo and mqu to DNK, rather than adding additional language system tags
that essentially would be there to activate exactly teh same typographic
features as DNK should.
Another similar example to DNK is the language system tag VIT which maps to
"vi". But could and maybe should map to all Vietnamese ethnic languages
that use the Latin script and share the same typographic traditions and
conventions as Vietnamese does.
To really use lang or xml:lang to enable locl OT features rather than
adding support for font-language-override and t future proof it as much as
possible time, money and resources  needs to be used to provide as an
extensive mapping as possible between bcp47 language tags and OT language
system tags.
I would consider this work to be out of scope of the CSS WG, and I'd
consider it to be out of scope for the OT spec itself. That leaves us with
the browser developers to do the work.
But there are other issues as well. Some OT language tags should never be
associated with bcp47 language tags. One such is KRN which represents the
Karen languages (essentially it is a macro language tag) encompassing a
number of of languages each with their own language code and in soem cases
having incompatible typographic conventions. A few of these languages have
their own OT language system tags and these should be used in preference to
KRN.
John Hudson, in a previous email, listed a number of OT language system
tags that do not correspond to any one language.
One other situation is where a OT language system tag is ambiguous or
problematic since the language tag on a HTML element is not sufficient in
and of itself to indicate if the language system should be used.In this
case I am thinking of the KHT language system tag.  The language tag kht
maps to KHT, but there are multiple orthographies for Khamti Shan. The
fonts that I have seen that support a locl feature for Khamti are based on
what is documented in UTN11 which is based on one of the orthographies in
current use.
Automatically using KHT for content tagged as kht will work in some cases,
but in others give the wrong variants and rendering. So even if you don't
support font-language-override you need some other approach to turning on
and off locl support.
a CSS rule like:
:lang(kht) {
    font-feature-settings: "locl" 0;
}
should prevent a browser form using the localised features of a font. In
this specific case I would expect the rule to prevent the browser form
automatically applying the KHT language system, but rather use the default
language system, which may be more appropriate for certain Khamti
orthographies. Although it doesn't solve the use case of where a SHN
langauge system is available and that may be preferred over the default
language system. In the absence of font-language-override maybe the browser
needs to implement mroe complex logic, ie
    if kht AND font-feature-settings: "locl" 0 AND SHN use SHN
    if kht AND font-feature-settings: "locl" 0 use dflt
    if kht then use KHT
    else use dflt
but that may not support all Khamti orthographies. I  keep thinking that
font-language-override is the easiest solution to the problem.
Automatically using lang / xml:lang to handle locl features is great, I am
not against it, I like the feature. But to properly implement it requires a
huge amount of work. And honestly I doubt web browser developers would be
willing to do the amount of research needed to get it right and support as
many languages as possible.
The reality is only a small subset of languages will be supported
automatically by lang / xml:lang approach.
For other languages that leaves either font-language-override or not using
the locl feature at all.
Andrew
Received on Saturday, 9 July 2016 07:57:01 UTC