[whatwg] API for encoding/decoding ArrayBuffers into text from John Tamplin on 2012-03-14 (public-whatwg-archive@w3.org from March 2012)

From: John Tamplin <jat@google.com>
Date: Tue, 13 Mar 2012 22:47:44 -0400
Message-ID: <CABLsOLBqt8LnVwiNQurjV2T6P85vrkw=tg3H57YbQqD-dS4GAQ@mail.gmail.com>

On Tue, Mar 13, 2012 at 8:19 PM, Glenn Maynard <glenn at zewt.org> wrote:

> Using Views instead of specifying the offset and length sounds good.
>
> On Tue, Mar 13, 2012 at 6:28 PM, Ian Hickson <ian at hixie.ch> wrote:
>
> >  - What's the use case for supporting anything but UTF-8?
> >
>
> Other Unicode encodings may be useful, to decode existing file formats
> containing (most likely at a minimum) UTF-16.  I don't feel strongly about
> that, though; we're stuck with UTF-16 as an internal representation in the
> platform, but that doesn't necessarily mean we need to support it as a
> transfer encoding.
>
> For non-Unicode legacy encodings, I think that even if use cases exist,
> they should be given more than the usual amount of scrutiny before being
> supported.
>

The whole idea is to be able to extract textual data out of some packed
binary format.  If you don't support the character sets people want to use,
they will simply do like they have to do now and hand-code the character
set conversion, where it will slow and inaccurate.

In particular, I think you have to include various ISO-8859-* character
sets (especially Latin1) and the non-Unicode character sets still
frequently used by Japanese and Chinese users.

I am fine with strongly suggesting that only UTF8 be used for new things,
but leaving out legacy support will severely limit the utility of this
library.

-- 
John A. Tamplin
Software Engineer (GWT), Google

Received on Tuesday, 13 March 2012 19:47:44 UTC