Re: HTML5 Issue 11 (encoding detection): I18N WG response...

Ian Hickson On 09-10-12 13.45:

> On Mon, 12 Oct 2009, Leif Halvard Silli wrote:
>> Ian Hickson On 09-10-11 21.23:
>>> On Sun, 11 Oct 2009, Leif Halvard Silli wrote (reordered):
>>>> The choice of character set - alphabet - for instance, has always 
>>>> been a political matter, and still is.
>>> Ok, then it seems sensible to use a political way of speaking to refer 
>>> to the choice of alphabet.
>>>
>>>> "Western this-and-that" is predominantly a political way of 
>>>> speaking.
>>> Good, then it is appropriate terminology.
>> Appropriate for what?
> 
> For the spec. Using political ways of speaking to talk about political 
> matters.


The one thing doesn't follow of the other here, no.

>> "Western European Language [environments]" as Addison suggested is a 
>> reasonable neutral term, btw, despite use of "Western". It also gives 
>> the reader much more hints about what the politics involved ...
> 
> "European" has no place in this term, as far as I can tell.


As a *hint* (about the politics that has lead to the situation 
where the characters of Win 1252 "dominates" the Web), West[ern] 
European is much better than just "Western", IMHO.

>>>> Therefore is wrong to use a wording that causes readers to think in 
>>>> political terms.
>>> But you agree that it _is_ a political matter.
>> Which "it" are you referring to now?
> 
> The choice of character set - alphabet.


As your Wikipedia pointer showed [*], the choice of default 
character encoding, is not related to the definition of the 
cultural and/or political entity "The Western world".

[*] http://en.wikipedia.org/wiki/Western_world

>> "Western demographics" is a term that leaves the job of finding out 
>> which those areas are to the reader, anyhow.
> 
> If we can have instead a table of languages to default encodings, I would 
> much rather have that. Is the data for such a table available?


And then you jump from demographics to languages. Is it certain 
that there is default legacy encoding or /all/ languages? And is 
it certain that a language has the same default legacy encoding in 
any locale?

Anyway, the Language Subtag Registry [LSR] [+] lists 7801 
languages. 90 of those are marked with "Suppress-Script: Latn" 
[#], which means that it is superfluous to tag these languages as 
using a Latin script. The alphabets of those 90 languages would 
have to be investigated, to see which of them that are covered by 
Win 1252.

[+] http://www.iana.org/assignments/language-subtag-registry
[#] http://tools.ietf.org/html/rfc5646#section-3.1.9


> On Mon, 12 Oct 2009, Maciej Stachowiak wrote:

   [...]

>> Note: in the browsers that vary this, it is always determined by 
>> "locale", not "demographic" (which is not a computing concept). I don't 
>> think using the term "demographic" makes sense in this context.
> 
> Fair enough. Changed to "locale".


+1 Right direction. Still not correct with "Western locales".
-- 
leif halvard silli

Received on Monday, 12 October 2009 12:34:22 UTC