[whatwg] Default encoding to UTF-8? from Glenn Maynard on 2011-12-02 (public-whatwg-archive@w3.org from December 2011)

From: Glenn Maynard <glenn@zewt.org>
Date: Fri, 2 Dec 2011 11:29:17 -0500
Message-ID: <CABirCh-ssAXDXa=+J_2YXYzjSWt9sksRaNpb4VDLARSWQHz+YA@mail.gmail.com>

On Fri, Dec 2, 2011 at 10:46 AM, Henri Sivonen <hsivonen at iki.fi> wrote:

> Regarding your "(and 16)" remark, considering my personal happiness at
> work, I'd prioritize the eradication of UTF-16 as an interchange
> encoding much higher than eradicating ASCII-based non-UTF-8 encodings
> that all major browsers support. I think suggesting a solution to the
> encoding problem while implying that UTF-16 is not a problem isn't
> particularly appropriate. :-)
>

UTF-16 is definitely terrible for interchange (it's terrible for internal
use, too, but we're stuck with that), and I'm all for anything that
prevents its proliferation.

I don't think I'd call it a bigger problem, though, since it's
comparatively (even vanishingly) rare, where untagged legacy encodings are
a widespread problem that gets worse every day we can't think of a way to
curtail it.

I don't have any new ideas for doing that, either, though.

I think in order to comply with the Support Existing Content design
> principle (even if it unfortunately means that support is siloed by
> locale) and in order to make plans that are game theoretically
> reasonable (not taking steps that make users migrate to browsers that
> haven't taken the steps), I think we shouldn't change the fallback
> encodings from what the HTML5 spec says when it comes to loading
> text/html or text/plain content into a browsing context.
>

And no browser vendor would ever do this, no matter what the spec says,
since nobody's willing to break massive swaths of existing content.

-- 
Glenn Maynard

Received on Friday, 2 December 2011 08:29:17 UTC