- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Mon, 5 Dec 2011 18:42:39 +0100
L. David Baron on Wed Nov 30 18:29:31 PST 2011: > On Wednesday 2011-11-30 15:28 -0800, Faruk Ates wrote: >> My understanding is that all browsers* default to Western Latin >> (ISO-8859-1) encoding by default (for Western-world >> downloads/OSes) due to legacy content on the web. But how relevant >> is that still today? Has any browser done any recent research into >> the need for this? > > The default varies by localization (and within that potentially by > platform), and unfortunately that variation does matter. You can > see Firefox's defaults here: > http://mxr.mozilla.org/l10n-mozilla-beta/search?string=intl.charset.default > (The localization and platform are part of the filename.) Last I checked, some of those locales defaulted to UTF-8. (And HTML5 defines it the same.) So how is that possible? Don't users of those locales travel as much as you do? Or do we consider the English locale user's as more important? Something is broken in the logics here! > I changed my Firefox from the ISO-8859-1 default to UTF-8 years ago > (by changing the "intl.charset.default" preference), and I do see a > decent amount of broken content as a result (maybe I encounter a new > broken page once a week? -- though substantially more often if I'm > looking at non-English pages because of travel). What kind of trouble are you actually describing here? You are describing a problem with using UTF-8 for *your locale*. What is your locale? It is probably English. Or do you consider your locale to be 'the Western world locale'? It sounds like *that* is what Anne has in mind when he brings in Dutch: http://blog.whatwg.org/weekly-encoding-woes (Quite often it sounds as if some see Latin-1 - or Windows-1251 as we now should say - as a 'super default' rather than a locale default. If that is the case, that it is a super default, then we should also spec it like that! Until further, I'll treat Latin-1 as it is specced: As a default for certain locales.) Since it is a locale problem, we need to understand which locale you have - and/or which locale you - and other debaters - think they have. Faruk probably uses a Spanish locale - right?, so the two of you are not speaking out of the same context. However, you also say that your problem is not so much related to pages written for *your* locale as it is related for pages written for users of *other* locales. So how many times per year do Dutch, Spanish or Norwegian - and other non-English pages - are creating troubles for you, as a English locale user? I am making an assumption: Almost never. You don't read those languages, do you? This is also an expectation thing: If you visit a Russian page in a legacy Cyrillic encoding, and gets mojibake because your browser defaults to Latin-1, then what does it matter to you whether your browser defaults to Latin-1 or UTF-8? Answer: Nothing. >> I'm wondering if it might not be good to start encouraging >> defaulting to UTF-8, and only fallback to Western Latin if it is >> detected that the content is very old / served by old >> infrastructure or servers, etc. And of course if the content is >> served with an explicit encoding of Western Latin. > > The more complex the rules, the harder they are for authors to > understand / debug. I wouldn't want to create rules like those. Agree that that particular idea is probably not the best. > I would, however, like to see movement towards defaulting to UTF-8: > the current situation makes the Web less world-wide because pages > that work for one user don't work for another. > > I'm just not quite sure how to get from here to there, though, since > such changes are likely to make users experience broken content. I think we should 'attack' the dominating locale first: The English locale, in its different incarnations (Australian, American, UK). Thus, we should turn things on the head: English users should start to expect UTF-8 to be used. Because, as English users, you are more used to 'mojibake' than the rest of us are: Whenever you see it, you 'know' that it is because it is a foreign language you are reading. It is we, the users of non-English locales, that need the default-to-legacy encoding behavior the most. Or, please, explain to us when and where it is important that English language users living in their own, native lands so to speak, need that their browser default to Latin-1 so that they can correctly read English language pages? If the English locales start defaulting to UTF-8, then little by little, the same expectation etc will start spreading to the other locales as well, not least because the 'geeks' of each locale will tend to see the English locale as a super default - and they might also use the US English locale of their OS and/or browser. We should not consider the needs of geeks - they will follow (read: lead) the way, so the fact that *they* may see mojibake, should not be a concern. See? We would have a plan. Or what do you think? Of course, we - or rather: the browser vendors - would need to market this as an important change. The HTML5 spec already justifies the use of UTF-8 several places - it says that pages might not work as expected e.g. w.r.t. URLs, unless UTF-8 is used. So there are enough of arguments that can be used. There are other technical ideas I have, such as treating the BOM the way Webkit and IE treats it - that would increase the number of pages treated as UTF-8 by all browsers a little bit [1]. However that can wait or whatever: The most important thing is to *initiate* the default encoding change. [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=12897 Leif Halvard Silli
Received on Monday, 5 December 2011 09:42:39 UTC