Re: HTML5 Issue 11 (encoding detection): I18N WG response...

Ian Hickson On 09-10-11 21.23:

> On Sun, 11 Oct 2009, Leif Halvard Silli wrote (reordered):
>> The choice of character set - alphabet - for instance, has always been a
>> political matter, and still is.
> 
> Ok, then it seems sensible to use a political way of speaking to refer to 
> the choice of alphabet.


We do not choose alphabet every day. Day to day, the right to use 
the alphabet that your language requires is what matters. And 
ditto language is required to express that.

>> "Western this-and-that" is predominantly a political way of speaking. 
> 
> Good, then it is appropriate terminology.


Appropriate for what? Diplomatic language is political and 
accurate, yet tries to avoid contested political phrasings.

"Western European Language [environments]" as Addison suggested is 
a reasonable neutral term, btw, despite use of "Western". It also 
gives the reader much more hints about what the politics involved  ...

Western demographics, OTOH ... You mentioned Africa: Egypt was a 
colony once. So was Kenya. Why does Kenya have an Western 
demographic, but Egypt not?

>> Therefore is wrong to use a wording that causes readers to think in 
>> political terms.
> 
> But you agree that it _is_ a political matter.


Which "it" are you referring to now?

>> It is wrong to nourish the thought that if some population changes to 
>> use an alphabet which is covered by Win1252, that they then will start 
>> to belong to the "Western demographics".
> 
> It doesn't matter if a population _changes_ to use an alphabet which is 
> covered by 1252, because that will only affect future pages, not legacy 
> pages, and it is only legacy pages we are concerned about.

I see the logic, but I wonder how you can any outcome for granted. 
I don't know what is default in Azerbaijan today ...

> What phrase best approximates the areas of the world where _today_ UAs are 
> shipping with a 1252 default encoding?


"Western demographics" is a term that leaves the job of finding 
out which those areas are to the reader, anyhow.

If you want to give better hints, then you could speak about "the 
British commonwealth, predominantly English, French, Spanish and 
Portuguese speaking demographics, demographics that was 
alphabetized as Western colonies earlier colonies of France, 
Belgium, England, Spain, Portugal" - etc. You should of course add 
that "the list is not exhaustive".

You could also say "demographics using the Latin alphabet covered 
by ASCII plus the letters ŠŒŽÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÚÛÜÝÞß". You 
may say that this is circular. But at least it can help 
implementors find the answer.

You could also list the names of the different Latin alphabets 
that are considered covered by Win1252: the ASCII alphabet, German 
alphabet(s), Scandinavian, etc. See Wikipedia:

http://en.wikipedia.org/wiki/Latin-derived_alphabet
http://en.wikipedia.org/wiki/Basic_modern_Latin_alphabet

You could also say "demographics covered by the Latin alphabet, 
except the following and other countries, which uses letters that 
are not covered by Win1251: Turkey, Croatia, Azerbaijan etc etc"

>> Does Croatia belong to "Western demographics, for instance? Why? And why 
>> not? The Croatian alphabet is not covered by Win1252. What about Serbia? 
>> Serbia uses both Cyrillic and Latin side by side.
> 
> What default encodings to browsers use in those areas?


I don't know. I just know that Win1252 doesn't cover the Croatian 
alphabet. And I have also gotten the impression that it is a 
problem that - if using one's own alphabet is seen as the normal 
thing - software may not default to a charset using the local 
alphabet.

>> As you can see, "Western demographics" is a wording that - depending on 
>> how you define "Western" -covers both narrower and wider than e.g. 
>> "writing systems covered by Win1252".
> 
> Is there a better term that would more accurately refer to the areas of 
> the world where a UA needs to ship with a Win1252 default encoding?


Se above. And below.

>> For example you could say "For demographics that are covered by what in 
>> user agents and e-mail applications are typically known as "Western" or 
>> "West European" encodings, then Win1252 is the best default".
> 
> That's circular logic ("Use Win1252 as a default for demographics where 
> Win1252 is the default"). 


To say that "Win1252" is the default for those areas which are 
covered by what is referred to as "Western encodings", is not a 
circular argument.

But your focus appears to be *areas*. And from that point of view 
I can see why you think it is circular.

But I thought that it was more relevant for implementors to know 
that Win1252 is considered the default for wherever "Western 
Encodings" are useful, than it is for them to know that there 
apparently exists a secret Union of Window 1252 Countries ...

However, I just now looked in Firefox to see what it meant by 
Western, and found, under "West European", both Greek and 
"Western" encodings ...

I suppose that Win1252 isn't the default encoding in Greece?

Proves that "Western" is a very imprecise term.

> The point is to be able to give implementation 
> advice that is useful independent of the implementor performing any 
> reverse engineering, studying of other user agents, etc.

It doesn't require "reverse engineering" to find out the language 
of a population, does it? What's really needed, if you want to do 
a good job, is to visit that country and observe and judge.

The issue of reverse engineering is, however, connected to what I 
said above above about "Win1252" being the default for areas 
covered by "Western encodings".
-- 
leif halvard silli

Received on Sunday, 11 October 2009 23:57:36 UTC