- From: Christoph Päper <christoph.paeper@crissov.de>
- Date: Sat, 6 Mar 2010 14:56:36 +0100
- To: www-style list <www-style@w3.org>
Christoph Päper:
> Jonathan Kew:
>> On 4 Mar 2010, at 13:30, Christoph Päper wrote:
>>>> http://dev.w3.org/csswg/css3-fonts/
>>> 6.9 font-lang-sys: normal | inherit | <string>
>>>
>>> This property currently uses OT language tags which are not really well designed or known as far as I know. It would be better for authors to use [BCP47] and let UAs do the mapping.
>> There will be cases where users need to select an alternative "best fit" language system according to what is actually available in their chosen fonts, and it is not possible for browsers to handle this reliably by mapping from BCP47 codes.
>
> I will send another mail with a quick statistics check on the available language tags.
BCP47 / ISO639 vs. OT language tags
The conversion table is found at <http://www.microsoft.com/typography/otspec/languagetags.htm>.
It lists 472 mappings of 392 OT to 440 ISO codes, if I didn’t miscount. It has 9 entries without ISO 639-3 equivalent(s), 2 are for general phonetic transcriptions, the others are languages (I assume).
APPH Phonetic transcription – Americanist conventions
IPPH Phonetic transcription – IPA conventions
BBR Berber
BCR Bible Cree
BML Bamileke
GAR Garshuni
MOR Moroccan
NGR Nagari
YCR Y-Cree
YIC Yi Classic
There are ten OT codes with 2, two with 3, one with 4 and two with more (22 and 43) mappings to ISO. Surely there are gaps in the opposite direction which would be more important. Here is the check for ambiguous ISO639 -> OT mapping. Most if not all of them could be solved by specifying a preferred alternative and use of appropriate subtypes (i.e. BCP 47 instead of ISO 639-3). ‘zho’ is used for 4, ‘chp’ and ‘kca’ for 3 and fifteen more for 2 OT codes.
zho ZHH Chinese Hong Kong
ZHP Chinese Phonetic
ZHS Chinese Simplified
ZHT Chinese Traditional
chp ATH Athapaskan
CHP Chipewyan
SAY Sayisi
kca KHK Khanty-Kazim
KHS Khanty-Shurishkar
KHV Khanty-Vakhi
caf ATH Athapaskan
CRR Carrier
crm LCR L-Cree
MCR Moose Cree
crx ATH Athapaskan
CRR Carrier
csw NCR N-Cree
NHC Norway House Cree
cwd DCR Woods Cree
TCR TH-Cree
div DIV Dhivehi
DHV Dhivehi (deprecated)
ell ELL Greek
PGR Polytonic Greek
flm HAL Halam
QIN Chin
gle IRI Irish
IRT Irish Traditional
kat KAT Georgian
KGE Khutsuri Georgian
krc BAL Balkar
KAR Karachay
mal MAL Malayalam Traditional
MLR Malayalam Reformed
scs ATH Athapaskan
SLA Slavey
xal KLM Kalmyk
TOD Todo
xsl ATH Athapaskan
SSL South Slavey
There’s also a number of tags that mismatch between the standards. I found 34, not counting the ones from the previous list: cmr, alt, bhi, hnd, lub, sot, afr, bal, bcr, bgr, chu, csy, dgr, dng, evn, grn, har, ing, kal, kar, kha, kmb, kuu, lad, man, men, mnk, mon, nyn, rom, sek, swa, tht. ISO tags that are not in the list have not been considered, so the number might be quite a bit higher.
Received on Saturday, 6 March 2010 13:57:09 UTC