W3C home > Mailing lists > Public > public-html@w3.org > October 2009

Re: HTML5 Issue 11 (encoding detection): I18N WG response...

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Tue, 13 Oct 2009 05:13:26 +0200
Message-ID: <4AD3F056.4010600@xn--mlform-iua.no>
To: Ian Hickson <ian@hixie.ch>
CC: Mark Davis ☕ <mark@macchiato.com>, Henri Sivonen <hsivonen@iki.fi>, Maciej Stachowiak <mjs@apple.com>, � <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, Andrew Cunningham <andrewc@vicnet.net.au>, Richard Ishida <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, Larry Masinter <masinter@adobe.com>
Leif Halvard Silli On 09-10-12 14.33:

> Ian Hickson On 09-10-12 13.45:


>>> "Western demographics" is a term that leaves the job of finding out 
>>> which those areas are to the reader, anyhow.
>> If we can have instead a table of languages to default encodings, I would 
>> much rather have that. Is the data for such a table available?


I am making a list of those languages, based on the Language 
Subtag Registry, as noted below. The preliminary, 50% unfinished 
version can be seen here:

http://www.malform.no/html5/The_Windows_1252_Languages

I am thankful to anyone who can provide data one the things I have 
not reached over yet, or if you see errors or inaccurate info. 
(You should probably contact me off list.) Often - probably when 
it seems obvious to the Wikipedia author - the alphabet in use 
isn't defined in the Wikipedia articles I consulted.

Indonesian language with 250 000 000 users worldwide is worth 
documenting  ...

> Anyway, the Language Subtag Registry [LSR] [+] lists 7801 
> languages. 90 of those are marked with "Suppress-Script: Latn" 
> [#], which means that it is superfluous to tag these languages as 
> using a Latin script. The alphabets of those 90 languages would 
> have to be investigated, to see which of them that are covered by 
> Win 1252.
> 
> [+] http://www.iana.org/assignments/language-subtag-registry
> [#] http://tools.ietf.org/html/rfc5646#section-3.1.9

Btw Wikipedia has an article about "Western Latin character sets 
in computing", which says: [1]

	]] These encodings were designed for representation of Italian, 
Spanish, Portuguese, French, German, Dutch, English, Danish, 
Swedish, Norwegian, and Icelandic, which use the Latin alphabet, a 
few additional letters and ones with precomposed diacritics, some 
punctuation, and various symbols (including some Greek letters). 
Although they're called "Western European" many of these languages 
are spoken all over the world. Also, these character sets happen 
to support many other languages such as Malay, Swahili, or 
Classical Latin. [[

[1] 
http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
-- 
leif halvard silli
Received on Tuesday, 13 October 2009 03:14:13 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:09 UTC