Re: [whatwg] StringEncoding: Allowed encodings for TextEncoder from Glenn Maynard on 2012-08-07 (public-whatwg-archive@w3.org from August 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Tue, 7 Aug 2012 12:47:15 -0500
To: Joshua Bell <jsbell@chromium.org>
Cc: WHAT Working Group <whatwg@whatwg.org>, Jonas Sicking <jonas@sicking.cc>
Message-ID: <CABirCh_L7FeyJNrtiTd4LRqt4d-twwPfdODcTJ8_uHaMwMqCWA@mail.gmail.com>

On Tue, Aug 7, 2012 at 11:48 AM, Joshua Bell <jsbell@chromium.org> wrote:

> It doesn't appear we reached consensus - there was some desire expressed
> to scope to UTF-8, then perhaps expand to include UTF-16, definite
> consensus that any encoding supported should be handled by both encode and
> decode, then comments about XHR and form data encodings, but then the
> discussion wandered into stateful vs. stateless encodings which took us off
> topic. So Glenn's comment below pretty much reboots the conversation where
> it was:
>

I don't agree that we necessarily need to support both encode and decode
for every encoding.

For example, an MP3 tag editor supporting legacy ID3 tags may want to be
able to decode ISO-8859-1, since it allows tags in that encoding.  However,
there's no reason to ever write MP3 tags in anything but Unicode--they only
need decode support for 8859-1, not encode.

This pattern of decode support for legacy, but only encoding to Unicode,
seems common today.  Many email clients today (not a use case, just a
comparison) also decode from any encoding but send only in UTF-8.

That's not to say there are no use cases for encoding other encodings, but
it's much easier to relax the restriction later and allow them if we really
need to than it is to go the other way, and I think there's a danger of
perpetuating legacy encodings if we're not careful.

 There are also cross-browser differences in handling decoding of certain
> code points in certain encodings. Exposing those encodings in a new API
> would either require that the browser vendors expose those differences
> (bleah) or implement a compatibility switch in the affected codecs (bleah).
>

The real fix for this would be for browsers to implement the encodings in
the correct, interoperable way when exposed by this API, even if that means
that this API interprets data differently than eg. the HTML parser.  MS has
made it clear that they won't touch their encodings in any way, due to
legacy support, but hopefully that doesn't apply to a new API with no
legacy at all.  (If you want to find that out you'll need to ask on webapps
or through some other channel, since they're not on this list.)

-- 
Glenn Maynard

Received on Tuesday, 7 August 2012 17:47:49 UTC