[whatwg] Default encoding to UTF-8?

On Wednesday 2011-11-30 15:28 -0800, Faruk Ates wrote:
> My understanding is that all browsers* default to Western Latin
> (ISO-8859-1) encoding by default (for Western-world
> downloads/OSes) due to legacy content on the web. But how relevant
> is that still today? Has any browser done any recent research into
> the need for this?

The default varies by localization (and within that potentially by
platform), and unfortunately that variation does matter.  You can
see Firefox's defaults here:
http://mxr.mozilla.org/l10n-mozilla-beta/search?string=intl.charset.default
(The localization and platform are part of the filename.)

I changed my Firefox from the ISO-8859-1 default to UTF-8 years ago
(by changing the "intl.charset.default" preference), and I do see a
decent amount of broken content as a result (maybe I encounter a new
broken page once a week? -- though substantially more often if I'm
looking at non-English pages because of travel).

> I'm wondering if it might not be good to start encouraging
> defaulting to UTF-8, and only fallback to Western Latin if it is
> detected that the content is very old / served by old
> infrastructure or servers, etc. And of course if the content is
> served with an explicit encoding of Western Latin.

The more complex the rules, the harder they are for authors to
understand / debug.  I wouldn't want to create rules like those.

I would, however, like to see movement towards defaulting to UTF-8:
the current situation makes the Web less world-wide because pages
that work for one user don't work for another.

I'm just not quite sure how to get from here to there, though, since
such changes are likely to make users experience broken content.

-David

-- 
?   L. David Baron                         http://dbaron.org/   ?
?   Mozilla                           http://www.mozilla.org/   ?

Received on Wednesday, 30 November 2011 18:29:31 UTC