Re: [whatwg] StringEncoding: Allowed encodings for TextEncoder from Jonas Sicking on 2012-08-07 (public-whatwg-archive@w3.org from August 2012)

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 7 Aug 2012 10:51:34 -0700
To: Joshua Bell <jsbell@chromium.org>
Cc: WHAT Working Group <whatwg@whatwg.org>, Glenn Maynard <glenn@zewt.org>
Message-ID: <CA+c2ei_XSx4frxBo0urEQUL7T=AAua9yHJMqjV_2nKOY9fXO-w@mail.gmail.com>

On Tue, Aug 7, 2012 at 9:48 AM, Joshua Bell <jsbell@chromium.org> wrote:
>> Not an objection, but where does XHR limit sent data to those encodings?
>> send(FormData) forces UTF-8 (which is even more restrictive);
>> send(Document) seems to allow any encoding *except* for UTF-16 (presumably
>> web compat since that's a weird criteria).
>>
>> I'm not sure that staying in sync with XHR--which has its own pile of
>> legacy code to support--is worthwhile here anyway, but limiting to Unicode
>> seems fine in its own right, especially since the restriction can always
>> be
>> lifted later if real needs come up.
>>
>> However I currently can't find any restrictions on which target
>> > encodings are supported in the current drafts.
>
>
> When Anne's spec appeared I gutted mine and deferred wherever possible to
> his. One consequence of that was getting the other encodings "for free" as
> far as the spec writing goes.
>
> If we achieve consensus that we only want to support UTF encodings we can
> add the restrictions. There are use cases for supporting other encodings
> (parsing legacy data file formats, for example), but that could be deferred.

I don't mind supporting *decoding* from basically any encoding that
Anne's spec enumerates. I don't see a downside with that since I
suspect most implementations will just call into a generic decoding
backend anyway, and so supporting the same set of encodings as for
other parts of the platform should be relatively easy.

That also means that we don't have to figure out which encodings we
need to support to support reading legacy file formats etc.

However I think we should consider restricting support to a smaller
set of encodings for while *encoding*. There should be little reason
for people today to produce text in non-utf formats. We might even be
able to get away with only supporting UTF8, though I wouldn't be
surprised if there are reasonably modern file formats which use utf16.

Restricting the encoding formats have the advantage of that we can
rely on the target encoding to support a consistent feature set. For
example we don't need to deal with defining what to do if we receive a
perfectly well formed string, but the target encoding doesn't support
all the characters in that string. Likewise we don't have to deal with
target encodings which doesn't support the replacement character.

/ Jonas

Received on Tuesday, 7 August 2012 17:52:35 UTC