[whatwg] API for encoding/decoding ArrayBuffers into text

On Sat, Mar 24, 2012 at 6:52 AM, Glenn Maynard <glenn at zewt.org> wrote:

> On Thu, Mar 22, 2012 at 8:58 AM, Anne van Kesteren <annevk at opera.com>
> wrote:
>
> > Another way would be to have a second optional argument that indicates
> > whether more bytes are coming (defaults to false), but I'm not sure of
> the
> > chances that would be used correctly. The reasons you outline are
> probably
> > why many browser implementations deal with EOF poorly too.
>
>
> It might not improve it, but I don't think it'd be worse.  If you didn't
> use it correctly for an encoding where it matters, the breakage would be
> obvious.
>
> Also, the previous "automatically-streaming" API has another possible
> misuse: constructing a single encoder, then calling it repeatedly for
> unrelated strings, without calling eof() between them (trailing bytes would
> become U+FFFD in the next string).  That'd be a less likely mistake with
> this, too.
>

Agreed. Simple things should be simple.


> Here's a suggestion, working from that:
>
> encoder = Encoder("euc-kr");
> view = encoder.encode(str1, {continues: true});
> view = encoder.encode(str2, {continues: true});
> view = encoder.encode(str3, {continues: false});
>
> An alternative way to end the stream:
>
> encoder = Encoder("euc-kr");
> view = encoder.encode(str1, {continues: true});
> view = encoder.encode(str2, {continues: true});
> view = encoder.encode(str3, {continues: true});
> view = encoder.encode("", {continues: false});
> // or view = encoder.encode(""); // equivalent; continues defaults to false
> // or view = encoder.encode(); // maybe equivalent, if the first parameter
> is optional
>
> The simplest usage is concise enough that we don't really need a separate
> str.encode() method:
>
> view = Encoder("euc-kr").encode(str);
>
> If it has an eof() method, it'd just be a literal wrapper for
> encoder.encode(), but it can probably be omitted.


Agreed, I'd omit it.

Bikeshed: The |continues| term doesn't completely thrill me; it's clear in
context, but not necessarily what someone might go searching for.
{eof:true} would be lovely except we want the default to be yes-EOF but a
falsy JS value. |noEOF| ?

If there aren't immediate objections, I'll update my wiki draft with this
style of API, and see about updating my JS polyfill as well.

Opinions on one object type (Encoding) vs. two (Encoder, Decoder) ?

One object type is simpler for the non-streaming case, e.g.:

// somewhere globally
g_codec = Encoding("euc-kr");
// elsewhere...
str = g_codec.decode(view); // okay
view = g_codec.encode(str); // fine, no state captured
str = g_codec.decode(view); // still okay

but IMHO someone unfamiliar with the internals of encodings might extend
the above into::

// somewhere globally
g_codec = Encoding("euc-kr");
// elsewhere in some stream handling code...
str = g_codec.decode(view, {continues: true}); // okay..
view = g_codec.encode(str, {continues: true}); // sure, now both an encode
and decode state are captured by codec
str = g_codec.decode(view, {continues: true}); // okay only if this is more
of the same stream; if there are two incoming streams, this is wrong

The same mistake is possible with Encoder / Decoder objects, of course (you
just need two globals). But something about separating them makes it
clearer to me that the |continues| flag is affecting state in the object
rather than just affecting the output of the call.

Received on Monday, 26 March 2012 09:56:41 UTC