- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Tue, 13 Oct 2009 05:13:26 +0200
- To: Ian Hickson <ian@hixie.ch>
- CC: Mark Davis ☕ <mark@macchiato.com>, Henri Sivonen <hsivonen@iki.fi>, Maciej Stachowiak <mjs@apple.com>, � <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, Andrew Cunningham <andrewc@vicnet.net.au>, Richard Ishida <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, Larry Masinter <masinter@adobe.com>
Leif Halvard Silli On 09-10-12 14.33: > Ian Hickson On 09-10-12 13.45: >>> "Western demographics" is a term that leaves the job of finding out >>> which those areas are to the reader, anyhow. >> If we can have instead a table of languages to default encodings, I would >> much rather have that. Is the data for such a table available? I am making a list of those languages, based on the Language Subtag Registry, as noted below. The preliminary, 50% unfinished version can be seen here: http://www.malform.no/html5/The_Windows_1252_Languages I am thankful to anyone who can provide data one the things I have not reached over yet, or if you see errors or inaccurate info. (You should probably contact me off list.) Often - probably when it seems obvious to the Wikipedia author - the alphabet in use isn't defined in the Wikipedia articles I consulted. Indonesian language with 250 000 000 users worldwide is worth documenting ... > Anyway, the Language Subtag Registry [LSR] [+] lists 7801 > languages. 90 of those are marked with "Suppress-Script: Latn" > [#], which means that it is superfluous to tag these languages as > using a Latin script. The alphabets of those 90 languages would > have to be investigated, to see which of them that are covered by > Win 1252. > > [+] http://www.iana.org/assignments/language-subtag-registry > [#] http://tools.ietf.org/html/rfc5646#section-3.1.9 Btw Wikipedia has an article about "Western Latin character sets in computing", which says: [1] ]] These encodings were designed for representation of Italian, Spanish, Portuguese, French, German, Dutch, English, Danish, Swedish, Norwegian, and Icelandic, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols (including some Greek letters). Although they're called "Western European" many of these languages are spoken all over the world. Also, these character sets happen to support many other languages such as Malay, Swahili, or Classical Latin. [[ [1] http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing) -- leif halvard silli
Received on Tuesday, 13 October 2009 03:14:13 UTC