[whatwg] API for encoding/decoding ArrayBuffers into text from Glenn Maynard on 2012-03-14 (public-whatwg-archive@w3.org from March 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Wed, 14 Mar 2012 17:53:12 -0500
Message-ID: <CABirCh-3PmscXcAWVkf6EfGwonXvyYVudiJupob_1zDEXqDg7g@mail.gmail.com>

On Tue, Mar 13, 2012 at 9:47 PM, John Tamplin <jat at google.com> wrote:
> I am fine with strongly suggesting that only UTF8 be used for new things,
> but leaving out legacy support will severely limit the utility of this
> library.

Not all limitations are bad, and I'd disagree with "seriously".

At a minimum, the set of encodings should be very carefully selected.
Limit it to Unicode to begin with, and if we're really going to put legacy
encodings on yet more life support, only add an encoding where there's a
clear, justified need for it.  (There are many encodings that browsers need
to support for text/html because they're used in legacy content, but which
nobody is still using today in new content--those should not be supported
here.)

But stick with Unicode for now.  Once an encoding is added, it's hard to
ever remove it.

On Wed, Mar 14, 2012 at 6:52 AM, Anne van Kesteren <annevk at opera.com> wrote:

> If we can make it a deterministic, unchanging, and defined algorithm, I
> think that would actually be acceptable. And ideally we do define that
> algorithm at some point so new browsers can enter the existing market more
> easily and existing browsers interpret existing content in the same way.

We don't have any untagged content to support yet, so let's not create an
API that guarantees it'll come into existence.  The heuristics you need
depend heavily on the content, anyway (for example, heuristics that work
for HTML probably won't for ID3 tags, which are generally very short).

On Wed, Mar 14, 2012 at 11:14 AM, Joshua Bell <jsbell at chromium.org> wrote:

> Having implemented a library that handled both text encodings and
> base16/base64 encoding, I can offer the opinion that the nomenclature gets
> very confusing since the encode/decode semantics are reversed.
>
> binary_buffer = encode(text_content)
> text_content = decode(binary_buffer)
>
> vs.
>
> binary_buffer = decode(base64_data)
> base64_data = encode(binary_buffer)
>

It's more than a naming problem.  With this string API, one side of the
conversion is always a DOMString.  Base64 conversion wants
ArrayBuffer<->ArrayBuffer conversions, so it would belong in a separate API.

-- 
Glenn Maynard

Received on Wednesday, 14 March 2012 15:53:12 UTC