- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 14 Oct 2009 17:36:41 +0200
- To: Henri Sivonen <hsivonen@iki.fi>
- CC: Ian Hickson <ian@hixie.ch>, Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Henri Sivonen On 09-10-14 15.28: > On Oct 14, 2009, at 06:40, Leif Halvard Silli wrote: > >> I especially picked the "os_RU" locale because it is situated in >> Russia and uses Cyrillic for everything. The ossetic alphabet seems >> to be fully compatible with Windows 1251. > > In that case, it would probably make sense to ship Windows-1251 as the > default for an Ossetian localization. Then I suppose we agree that Ian's table must not simply say that "For all other locales, use Windows 1252 as default", right? >> win1252 - bn-BD - Not Latin: Bengali Bangladesh >> win1252 - bn-IN – Not Latin: Benagli India > > I don't have data about Bengali Web pages, but if it turns out that > most Bengali content is labeled but that users of Bengali-localized > browsers also read a lot of unlabeled English content, Windows-1252 > would make sense as the default. But aren't English content supported by ASCII, and thus by UTF-8? I could understand it if you had said that they read for example legacy French content - like many of the Arabic users certainly do. However, for the Arabic locale, you have UTF-8 as default ... What is the purpose of setting UTF-8 as the default, other than as an encouragement to use that encoding, if that encoding is detectable even without such a default? >> UTF-8 - cy - Win1252 doesn't fully cover Welsh > > I seems very plausible that users of a Welsh browser UI read a lot of > English content. If it happens that Welsh content is labeled and the > English content is what's unlabeled, Windows-1252 would make sense as > the default. > > This isn't about what encoding covers the language of the > localization. This is about what's the most common unlabeled encoding > that the users of a particular localization encounter. For Croat you have set it to UTF-8. It took me only one Google search to find Croat content that was ISO-8859-2, but which was labeled as ISO-8859-1. Thus, it seems to me that the reason why the Slavic languages of former Yugoslavia have been set to UTF-8, is related to the culture they have of treating two different alphabets equally (from the very design of their alphabets to YUSCII and beyond ...) At least there seems to be more things involved than "the most common unlabeled encoding" for that user group. As for Welsh: This is minority market. Mozilla (and Google also) has won market shares by allowing people to engage in localization work. There probably isn't a Welsh version of Internet Explorer (fingers crossed, hoping for the opposite). Anyway, if most English legacy content is supported by ASCII then why not UTF-8? If there /are/ reasons to have UTF-8 as default, then I can very well understand why the Welsh localizers chose UTF-8 as default! Are there any data on how much unlabeled English content there are out there that uses anything other than the ASCII repertoire? Doesn't most of the unlabeled English content use HTML entities for the "special" characters anyway? >> Why is it safer for Welsh to use UTF-8 as default. > > I rather suspect that UTF-8 isn't the best default for any locale, > since real UTF-8 content is unlikely to rely on the last defaulting > step for decoding. I don't know why some Firefox localizations default > to UTF-8. So *is* there any reason to have UTF-8 as default *anywhere*, other than the motto "yes, let's switch to UTF-8"? >> Also, again: I took up Belarusian. Why does it have ISO-8859-5 as >> default? > > I filed a bug on this, FWIW. Maybe "why" is answered in the bug report > in due course: > https://bugzilla.mozilla.org/show_bug.cgi?id=522218 Cool! I also wonder why you don't apply charset detection for that locale. (If I understood your localization files correctly.) >> Do you just trust whatever comes out of Mozilla? > > It would be helpful to dig up data on how Microsoft configures IE by > default in various locales. And Opera if Opera varies the default by > locale. Indeed. But I wonder if it would be smarter to just document those things - including their effects, rather than saying that vendors and users (the text also speaks about user defined encodings) /should/ use those encodings. -- leif halvard silli
Received on Wednesday, 14 October 2009 15:37:18 UTC