Re: HTML5 Issue 11 (encoding detection): I18N WG response... from Andrew Cunningham on 2009-10-12 (public-html@w3.org from October 2009)

From: Andrew Cunningham <andrewc@vicnet.net.au>
Date: Mon, 12 Oct 2009 12:59:23 +1100
To: Larry Masinter <masinter@adobe.com>
CC: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Ian Hickson <ian@hixie.ch>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, Richard Ishida <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <4AD28D7B.7060602@vicnet.net.au>
*shrugs*


as far as i can tell its something that shouldn't be defined by the 
developers, but rather defined by the localisation teams who choose a 
suitable default encoding for the particular UI locale they are developing.


Larry Masinter wrote:

> Can someone please explain, again, why the discussion of default
> configurations of a particular category of user agent in various
> regions belongs in the definition of the HyperText Markup Language?
>
> What benefit can any author of a web page derive, please, from
> knowing what the default settings of various browsers in products
> sold into various language environments?
>
> What benefits to the Internet, the Web, to anyone else, is there
> in specifying what the default configuration should be for various
> "demographics", independent of the actual user's language and
> preference? Does it help a Kenyan who brings a laptop for use
> by his Egyptian wife living in Finland?
>
> What is going on here?
>
> Thanks,
>
> Larry
> --
> http://larry.masinter.net
>
>
> -----Original Message-----
> From: public-html-request@w3.org [mailto:public-html-request@w3.org] On Behalf Of Leif Halvard Silli
> Sent: Sunday, October 11, 2009 4:57 PM
> To: Ian Hickson
> Cc: "Martin J. Dürst"; Phillips, Addison; Andrew Cunningham; Richard Ishida; public-html@w3.org; public-i18n-core@w3.org
> Subject: Re: HTML5 Issue 11 (encoding detection): I18N WG response...
>
> Ian Hickson On 09-10-11 21.23:
>
>   
>> On Sun, 11 Oct 2009, Leif Halvard Silli wrote (reordered):
>>     
>>> The choice of character set - alphabet - for instance, has always been a
>>> political matter, and still is.
>>>       
>> Ok, then it seems sensible to use a political way of speaking to refer to 
>> the choice of alphabet.
>>     
>
>
> We do not choose alphabet every day. Day to day, the right to use 
> the alphabet that your language requires is what matters. And 
> ditto language is required to express that.
>
>   
>>> "Western this-and-that" is predominantly a political way of speaking. 
>>>       
>> Good, then it is appropriate terminology.
>>     
>
>
> Appropriate for what? Diplomatic language is political and 
> accurate, yet tries to avoid contested political phrasings.
>
> "Western European Language [environments]" as Addison suggested is 
> a reasonable neutral term, btw, despite use of "Western". It also 
> gives the reader much more hints about what the politics involved  ...
>
> Western demographics, OTOH ... You mentioned Africa: Egypt was a 
> colony once. So was Kenya. Why does Kenya have an Western 
> demographic, but Egypt not?
>
>   
>>> Therefore is wrong to use a wording that causes readers to think in 
>>> political terms.
>>>       
>> But you agree that it _is_ a political matter.
>>     
>
>
> Which "it" are you referring to now?
>
>   
>>> It is wrong to nourish the thought that if some population changes to 
>>> use an alphabet which is covered by Win1252, that they then will start 
>>> to belong to the "Western demographics".
>>>       
>> It doesn't matter if a population _changes_ to use an alphabet which is 
>> covered by 1252, because that will only affect future pages, not legacy 
>> pages, and it is only legacy pages we are concerned about.
>>     
>
> I see the logic, but I wonder how you can any outcome for granted. 
> I don't know what is default in Azerbaijan today ...
>
>   
>> What phrase best approximates the areas of the world where _today_ UAs are 
>> shipping with a 1252 default encoding?
>>     
>
>
> "Western demographics" is a term that leaves the job of finding 
> out which those areas are to the reader, anyhow.
>
> If you want to give better hints, then you could speak about "the 
> British commonwealth, predominantly English, French, Spanish and 
> Portuguese speaking demographics, demographics that was 
> alphabetized as Western colonies earlier colonies of France, 
> Belgium, England, Spain, Portugal" - etc. You should of course add 
> that "the list is not exhaustive".
>
> You could also say "demographics using the Latin alphabet covered 
> by ASCII plus the letters ŠŒŽÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÚÛÜÝÞß". You 
> may say that this is circular. But at least it can help 
> implementors find the answer.
>
> You could also list the names of the different Latin alphabets 
> that are considered covered by Win1252: the ASCII alphabet, German 
> alphabet(s), Scandinavian, etc. See Wikipedia:
>
> http://en.wikipedia.org/wiki/Latin-derived_alphabet
> http://en.wikipedia.org/wiki/Basic_modern_Latin_alphabet
>
> You could also say "demographics covered by the Latin alphabet, 
> except the following and other countries, which uses letters that 
> are not covered by Win1251: Turkey, Croatia, Azerbaijan etc etc"
>
>   
>>> Does Croatia belong to "Western demographics, for instance? Why? And why 
>>> not? The Croatian alphabet is not covered by Win1252. What about Serbia? 
>>> Serbia uses both Cyrillic and Latin side by side.
>>>       
>> What default encodings to browsers use in those areas?
>>     
>
>
> I don't know. I just know that Win1252 doesn't cover the Croatian 
> alphabet. And I have also gotten the impression that it is a 
> problem that - if using one's own alphabet is seen as the normal 
> thing - software may not default to a charset using the local 
> alphabet.
>
>   
>>> As you can see, "Western demographics" is a wording that - depending on 
>>> how you define "Western" -covers both narrower and wider than e.g. 
>>> "writing systems covered by Win1252".
>>>       
>> Is there a better term that would more accurately refer to the areas of 
>> the world where a UA needs to ship with a Win1252 default encoding?
>>     
>
>
> Se above. And below.
>
>   
>>> For example you could say "For demographics that are covered by what in 
>>> user agents and e-mail applications are typically known as "Western" or 
>>> "West European" encodings, then Win1252 is the best default".
>>>       
>> That's circular logic ("Use Win1252 as a default for demographics where 
>> Win1252 is the default"). 
>>     
>
>
> To say that "Win1252" is the default for those areas which are 
> covered by what is referred to as "Western encodings", is not a 
> circular argument.
>
> But your focus appears to be *areas*. And from that point of view 
> I can see why you think it is circular.
>
> But I thought that it was more relevant for implementors to know 
> that Win1252 is considered the default for wherever "Western 
> Encodings" are useful, than it is for them to know that there 
> apparently exists a secret Union of Window 1252 Countries ...
>
> However, I just now looked in Firefox to see what it meant by 
> Western, and found, under "West European", both Greek and 
> "Western" encodings ...
>
> I suppose that Win1252 isn't the default encoding in Greece?
>
> Proves that "Western" is a very imprecise term.
>
>   
>> The point is to be able to give implementation 
>> advice that is useful independent of the implementor performing any 
>> reverse engineering, studying of other user agents, etc.
>>     
>
> It doesn't require "reverse engineering" to find out the language 
> of a population, does it? What's really needed, if you want to do 
> a good job, is to visit that country and observe and judge.
>
> The issue of reverse engineering is, however, connected to what I 
> said above above about "Win1252" being the default for areas 
> covered by "Western encodings".
>   

-- 
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Fax: +61-3-9639-2175

Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com

http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au
Received on Monday, 12 October 2009 02:00:12 UTC