Re: HTML5 Issue 11 (encoding detection): I18N WG response... from Leif Halvard Silli on 2009-10-13 (public-html@w3.org from October 2009)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Tue, 13 Oct 2009 05:13:26 +0200
To: Ian Hickson <ian@hixie.ch>
CC: Mark Davis ☕ <mark@macchiato.com>, Henri Sivonen <hsivonen@iki.fi>, Maciej Stachowiak <mjs@apple.com>, � <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, Andrew Cunningham <andrewc@vicnet.net.au>, Richard Ishida <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, Larry Masinter <masinter@adobe.com>
Message-ID: <4AD3F056.4010600@xn--mlform-iua.no>

Leif Halvard Silli On 09-10-12 14.33:

> Ian Hickson On 09-10-12 13.45:


>>> "Western demographics" is a term that leaves the job of finding out 
>>> which those areas are to the reader, anyhow.
>> If we can have instead a table of languages to default encodings, I would 
>> much rather have that. Is the data for such a table available?


I am making a list of those languages, based on the Language 
Subtag Registry, as noted below. The preliminary, 50% unfinished 
version can be seen here:

http://www.malform.no/html5/The_Windows_1252_Languages

I am thankful to anyone who can provide data one the things I have 
not reached over yet, or if you see errors or inaccurate info. 
(You should probably contact me off list.) Often - probably when 
it seems obvious to the Wikipedia author - the alphabet in use 
isn't defined in the Wikipedia articles I consulted.

Indonesian language with 250 000 000 users worldwide is worth 
documenting  ...

> Anyway, the Language Subtag Registry [LSR] [+] lists 7801 
> languages. 90 of those are marked with "Suppress-Script: Latn" 
> [#], which means that it is superfluous to tag these languages as 
> using a Latin script. The alphabets of those 90 languages would 
> have to be investigated, to see which of them that are covered by 
> Win 1252.
> 
> [+] http://www.iana.org/assignments/language-subtag-registry
> [#] http://tools.ietf.org/html/rfc5646#section-3.1.9

Btw Wikipedia has an article about "Western Latin character sets 
in computing", which says: [1]

 ]] These encodings were designed for representation of Italian, 
Spanish, Portuguese, French, German, Dutch, English, Danish, 
Swedish, Norwegian, and Icelandic, which use the Latin alphabet, a 
few additional letters and ones with precomposed diacritics, some 
punctuation, and various symbols (including some Greek letters). 
Although they're called "Western European" many of these languages 
are spoken all over the world. Also, these character sets happen 
to support many other languages such as Malay, Swahili, or 
Classical Latin. [[

[1] 
http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
-- 
leif halvard silli

Received on Tuesday, 13 October 2009 03:14:13 UTC