languages to encodings associations . . . from Albretch Mueller on 2019-09-26 (www-international@w3.org from July to September 2019)

From: Albretch Mueller <lbrtchx@gmail.com>
Date: Thu, 26 Sep 2019 13:09:50 +0200
To: www-international@w3.org
Message-ID: <CAFakBwi_pmnkj+tQwAL1q8Ax+UER4QOD41Y6UbVLM3wu6ML6FQ@mail.gmail.com>

 I have found lists  based on the ISO 639 such as those used by the US
Library of Congress, which contain the ISO 639-1, 2 and 3 codes for
the representation of names of languages, but I am not able to find a
languages to encodings associations list. For example, even if helpful
towards my goal, this one:

 https://docs.python.org/2/library/codecs.html

 Doesn't really give you a language-encodings association and
languages such as the second and fourth most spoken by # of native
speakers (Spanish and Hindi) are not listed.

 UTF-8 could be used to encode any language but that is not so with
all other encodings.

 Basically what I have in mind is some data looking like:

 ISO-639-3|ISO-639-2|ISO-639-1|Name of language|Name of language as
java " \uffff" unicode format|all encodings that can be used with that
language.

 Example, these would be the first 5 fields of three languages:

|tur|tur|tr|Türkçe|\u0054\u00fc\u0072\u006b\u00e7\u0065|
|rus|rus|ru|Русский|\u0420\u0443\u0441\u0441\u043a\u0438\u0439|
|spa|spa|es|Español|\0045\u0073\u0070\u0061\u00f1\u006f\u006c|

 and after those initial four fields, all possible specific encodings
used for the language

 I thought such a list should be easy to find out there.

 Any lists of documentations you would suggest?

 lbrtchx

Received on Thursday, 26 September 2019 11:10:14 UTC