Re: [whatwg] StringEncoding: Allowed encodings for TextEncoder from Joshua Bell on 2012-08-07 (public-whatwg-archive@w3.org from August 2012)

From: Joshua Bell <jsbell@chromium.org>
Date: Tue, 7 Aug 2012 10:48:42 -0600
To: Glenn Maynard <glenn@zewt.org>
Cc: WHAT Working Group <whatwg@whatwg.org>, Jonas Sicking <jonas@sicking.cc>
Message-ID: <CAD649j70tvf0UUtpx-dia5mMAynLzdJwCKccJ5c8wBbkN1DVoQ@mail.gmail.com>

On Tue, Aug 7, 2012 at 8:32 AM, Glenn Maynard <glenn@zewt.org> wrote:

> On Mon, Aug 6, 2012 at 11:39 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>
> > I seem to have a recollection that we discussed only allowing encoding
> > to UTF8 and UTF16LE, UTF16BE. This in order to promote these formats
> > as well as stay in sync with other APIs like XMLHttpRequest.
> >
>

It looks like the relevant discussion was at
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-March/035038.html

It doesn't appear we reached consensus - there was some desire expressed to
scope to UTF-8, then perhaps expand to include UTF-16, definite consensus
that any encoding supported should be handled by both encode and decode,
then comments about XHR and form data encodings, but then the discussion
wandered into stateful vs. stateless encodings which took us off topic. So
Glenn's comment below pretty much reboots the conversation where it was:

> Not an objection, but where does XHR limit sent data to those encodings?
> send(FormData) forces UTF-8 (which is even more restrictive);
> send(Document) seems to allow any encoding *except* for UTF-16 (presumably
> web compat since that's a weird criteria).
>
> I'm not sure that staying in sync with XHR--which has its own pile of
> legacy code to support--is worthwhile here anyway, but limiting to Unicode
> seems fine in its own right, especially since the restriction can always be
> lifted later if real needs come up.
>
> However I currently can't find any restrictions on which target
> > encodings are supported in the current drafts.
>

When Anne's spec appeared I gutted mine and deferred wherever possible to
his. One consequence of that was getting the other encodings "for free" as
far as the spec writing goes.

If we achieve consensus that we only want to support UTF encodings we can
add the restrictions. There are use cases for supporting other encodings
(parsing legacy data file formats, for example), but that could be deferred.

> > One wrinkle in this is if we want to support arbitrary encodings when
> > encoding, that means that we can't use "insert a the replacement
> > character" as default error handling since that isn't available in a
> > lot of encoding formats.
> >
>
> I don't think this part is a real hurdle.  Just replace with "?" for
> non-Unicode encodings.
>

On Tue, Aug 7, 2012 at 8:10 AM, Joshua Cranmer <Pidgeot18@verizon.net>wrote:
>
> > I found that the wiki version of the proposal cites <
> > http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html> as the way to
> > find encodings.
> >
>
> That spec documents the encodings which are used anywhere in the platform,
> but that doesn't necessarily mean every API needs to support all those
> encodings.  It's almost all backwards-compatibility.
>

There are also cross-browser differences in handling decoding of certain
code points in certain encodings. Exposing those encodings in a new API
would either require that the browser vendors expose those differences
(bleah) or implement a compatibility switch in the affected codecs (bleah).

Received on Tuesday, 7 August 2012 16:49:11 UTC