- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 9 Dec 2011 15:34:08 +0200
On Fri, Dec 9, 2011 at 12:33 AM, Leif Halvard Silli <xn--mlform-iua at xn--mlform-iua.no> wrote: > Henri Sivonen Tue Dec 6 23:45:11 PST 2011: > These localizations are nevertheless live tests. If we want to move > more firmly in the direction of UTF-8, one could ask users of those > 'live tests' about their experience. Filed https://bugzilla.mozilla.org/show_bug.cgi?id=708995 >> (which means >> *other-language* pages when the language of the localization doesn't >> have a pre-UTF-8 legacy). > > Do you have any concrete examples? The example I had in mind was Welsh. > And are there user complaints? Not that I know of, but I'm not part of a feedback loop if there even is a feedback loop here. > The Serb localization uses UTF-8. The Croat uses Win-1252, but only on > Windows and Mac: On Linux it appears to use UTF-8, if I read the HG > repository correctly. OS-dependent differences are *very* suspicious. :-( >> I think that defaulting to UTF-8 is always a bug, because at the time >> these localizations were launched, there should have been no unlabeled >> UTF-8 legacy, because up until these locales were launched, no >> browsers defaulted to UTF-8 (broadly speaking). I think defaulting to >> UTF-8 is harmful, because it makes it possible for locale-siloed >> unlabeled UTF-8 content come to existence > > The current legacy encodings nevertheless creates siloed pages already. > I'm also not sure that it would be a problem with such a UTF-8 silo: > UTF-8 is possible to detect, for browsers - Chrome seems to perform > more such detection than other browsers. While UTF-8 is possible to detect, I really don't want to take Firefox down the road where users who currently don't have to suffer page load restarts from heuristic detection have to start suffering them. (I think making incremental rendering any less incremental for locales that currently don't use a detector is not an acceptable solution for avoiding restarts. With English-language pages, the UTF-8ness might not be apparent from the first 1024 bytes.) > In another message you suggested I 'lobby' against authoring tools. OK. > But the browser is also an authoring tool. In what sense? > So how can we have authors > output UTF-8, by default, without changing the parsing default? Changing the default is an XML-like solution: creating breakage for users (who view legacy pages) in order to change author behavior. To the extent a browser is a tool Web authors use to test stuff, it's possible to add various whining to console without breaking legacy sites for users. See https://bugzilla.mozilla.org/show_bug.cgi?id=672453 https://bugzilla.mozilla.org/show_bug.cgi?id=708620 > Btw: In Firefox, then in one sense, it is impossible to disable > "automatic" character detection: In Firefox, overriding of the encoding > only lasts until the next reload. A persistent setting for changing the fallback default is in the "Advanced" subdialog of the font prefs in the "Content" preference pane. It's rather counterintuitive that the persistent autodetection setting is in the same menu as the one-off override. As for heuristic detection based on the bytes of the page, the only heuristic that can't be disabled is the heuristic for detecting BOMless UTF-16 that encodes Basic Latin only. (Some Indian bank was believed to have been giving that sort of files to their customers and it "worked" in pre-HTML5 browsers that silently discarded all zero bytes prior to tokenization.) The Cyrillic and CJK detection heuristics can be turned on and off by the user. Within an origin, Firefox considers the parent frame and the previous document in the navigation history as sources of encoding guesses. That behavior is not user-configurable to my knowledge. Firefox also remembers the encoding from previous visits as long as Firefox otherwise has the page in cache. So for testing, it's necessary to make Firefox forget about previous visits to the test case. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Friday, 9 December 2011 05:34:08 UTC